Ask Your Question
1

How to detect non-breakable space when iterating document via UNO

asked 2017-10-09 05:07:23 +0200

Alexander Malahov gravatar image

updated 2017-10-09 05:07:49 +0200

I have following paragraph (replaced non-breakable space with % here for clarity):

NORMAL11111111111111111111111111111111111111111111111111111111111111111111
non%breakable normal2

I iterate TextPortions and can get all the other character properties like weight, posture, color etc.
TextPortions are:

  • "NORMAL"
  • "1"
  • "1111111111111111111111111111111111111111111111111111111111111111111"
  • " non breakable normal2"

The line of code print(portion.CharKeepTogether) gives me an error: "AttributeError: CharKeepTogether".
And indeed, when I inspect all the properties of the portion with unohelper.inspect(portion, sys.stdout) (it prints out all I can get through the object), there is no CharKeepTogether in the list.

I've tried this with LibreOffice 5.3.0 and 5.4.2.2

Any ideas ?

edit retag flag offensive close merge delete

Comments

2
Lupp gravatar imageLupp ( 2017-12-09 21:43:14 +0200 )edit
1

@Alexander Malahov: Good question, although it took me awhile to understand that this was cross-posted to another site. A simple statement like "I also asked this at ..." would help.

Jim K gravatar imageJim K ( 2017-12-10 09:55:16 +0200 )edit
1

I was not the questioner there. Should have added the hint "Cross-posting:" instead of "See also:". Sorry!

Lupp gravatar imageLupp ( 2017-12-10 11:02:27 +0200 )edit

1 Answer

Sort by » oldest newest most voted
2

answered 2017-12-10 09:50:59 +0200

Jim K gravatar image

As explained at https://ask.libreoffice.org/en/questi..., apparently there is no longer a property named CharKeepTogether. Try the following code instead.

def count_nonbreaking_spaces():
    oParEnum = XSCRIPTCONTEXT.getDocument().getText().createEnumeration()
    count = 0
    while oParEnum.hasMoreElements():
        oPar = oParEnum.nextElement()
        oPortionEnum = oPar.createEnumeration()
        while oPortionEnum.hasMoreElements():
            oTextPortion = oPortionEnum.nextElement()
            if oTextPortion.TextPortionType == "Text":
                s = oTextPortion.getString()
                for c in s:
                    if c == "\xA0":
                        count += 1
    print("Found %d nonbreaking spaces." % (count))

0xA0 is described at http://www.fileformat.info/info/unico...

edit flag offensive delete link more

Comments

2

+1

The "CharKeepTogether" attribute seems to have no associated code from the day 1 of OpenOffice.org. ODF has no such property for character. It is obsolete, and the proposal to remove it (by @Jim K in another topic) makes perfect sense IMO (needs filing a bug?).

Line breaking rules are described in UAX #14 by Unicode, and need no additional character run properties. Specific Unicode codepoints are already assigned their relevant classes.

Mike Kaganski gravatar imageMike Kaganski ( 2017-12-10 10:30:53 +0200 )edit

The usage of ZERO WIDTH NO-BREAK SPACE U+FEFF adds invisible but countable characters. CharKeepTogether, if implemented, would not. The missing "optional" CharacterProperty isn't completely obsolete therefore. The difference may be an issue in rare cases in Calc e.g.

Lupp gravatar imageLupp ( 2017-12-10 11:14:21 +0200 )edit

It is. It would add yet another duplicating implementation for the same feature, with one distinctive property (non-countability), and a huge potential for confusion ("I have cheared character properties, but it still doesn't break here!"). One can invent these duplicating entities ad nauseum, with ~no potential profit, bloating interface and increasing support cost and regression probability exponentially.

Mike Kaganski gravatar imageMike Kaganski ( 2017-12-10 12:03:51 +0200 )edit

Big difference, imo, in usability/reliability vs "misusability" between an implemented CharKeepTogether and the "alternative" you joked about.

Lupp gravatar imageLupp ( 2017-12-10 12:27:45 +0200 )edit

I'm not sure I understand you correctly (I suppose that you are telling that you'd prefer CharKeepTogether to the U+FEFF), but if I do: you cannot skip implementing Unicode standard, so however "superior" another alternative were, it is only an alternative.

Mike Kaganski gravatar imageMike Kaganski ( 2017-12-10 13:00:25 +0200 )edit

I was telling that I assumed a proposed property like CharNotCountable to be a joke.
On the other hand the insertion of (at least two) U+FEFF to keep a dash-connected term together, e.g. (this was asked here a while ago) is unhandy and not a true alternative to CharKeepTogether for a TextPiece. As a means to control the line wrapping in specific cases it is rather a workaround than a solution. I don't see multiple implementations of the same feature here.

Lupp gravatar imageLupp ( 2017-12-10 13:40:40 +0200 )edit

And somehow kidding again: See also this page.

Lupp gravatar imageLupp ( 2017-12-10 13:42:11 +0200 )edit

On the other hand the insertion of (at least two) U+FEFF to keep a dash-connected term together, e.g. ... is unhandy and not a true alternative to CharKeepTogether for a TextPiece

NON-BREAKING HYPHEN (U+2011) is a single proper character for this. And using both usual hyphen or a dash in this case is improper, so CharKeepTogether is not a true alternative and rather a workaround.

Mike Kaganski gravatar imageMike Kaganski ( 2017-12-10 15:05:42 +0200 )edit

See https://ask.libreoffice.org/en/questi... (my attempted answer there). Can there be an acceptable reason to want 12 or event 13 characters glued together?
I will no longer insist concerning this marginal issue.

Lupp gravatar imageLupp ( 2017-12-10 16:49:42 +0200 )edit

After reading the first comment from @Mike Kaganski, I decided to file a bug at https://bugs.documentfoundation.org/s....

Jim K gravatar imageJim K ( 2017-12-12 00:35:01 +0200 )edit
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2017-10-09 05:07:23 +0200

Seen: 211 times

Last updated: Dec 10 '17