Ask Your Question

How to detect non-breakable space when iterating document via UNO [closed]

asked 2017-10-09 05:07:23 +0100

Alexander Malahov gravatar image

updated 2020-10-22 19:58:34 +0100

Alex Kemp gravatar image

I have following paragraph (replaced non-breakable space with % here for clarity):

non%breakable normal2

I iterate TextPortions and can get all the other character properties like weight, posture, color etc.
TextPortions are:

  • "NORMAL"
  • "1"
  • "1111111111111111111111111111111111111111111111111111111111111111111"
  • " non breakable normal2"

The line of code print(portion.CharKeepTogether) gives me an error: "AttributeError: CharKeepTogether".
And indeed, when I inspect all the properties of the portion with unohelper.inspect(portion, sys.stdout) (it prints out all I can get through the object), there is no CharKeepTogether in the list.

I've tried this with LibreOffice 5.3.0 and

Any ideas ?

edit retag flag offensive reopen merge delete

Closed for the following reason the question is answered, right answer was accepted by Alex Kemp
close date 2020-10-22 19:58:49.331454


Lupp gravatar imageLupp ( 2017-12-09 21:43:14 +0100 )edit

@Alexander Malahov: Good question, although it took me awhile to understand that this was cross-posted to another site. A simple statement like "I also asked this at ..." would help.

Jim K gravatar imageJim K ( 2017-12-10 09:55:16 +0100 )edit

I was not the questioner there. Should have added the hint "Cross-posting:" instead of "See also:". Sorry!

Lupp gravatar imageLupp ( 2017-12-10 11:02:27 +0100 )edit

1 Answer

Sort by » oldest newest most voted

answered 2017-12-10 09:50:59 +0100

Jim K gravatar image

As explained at, apparently there is no longer a property named CharKeepTogether. Try the following code instead.

def count_nonbreaking_spaces():
    oParEnum = XSCRIPTCONTEXT.getDocument().getText().createEnumeration()
    count = 0
    while oParEnum.hasMoreElements():
        oPar = oParEnum.nextElement()
        oPortionEnum = oPar.createEnumeration()
        while oPortionEnum.hasMoreElements():
            oTextPortion = oPortionEnum.nextElement()
            if oTextPortion.TextPortionType == "Text":
                s = oTextPortion.getString()
                for c in s:
                    if c == "\xA0":
                        count += 1
    print("Found %d nonbreaking spaces." % (count))

0xA0 is described at

edit flag offensive delete link more




The "CharKeepTogether" attribute seems to have no associated code from the day 1 of ODF has no such property for character. It is obsolete, and the proposal to remove it (by @Jim K in another topic) makes perfect sense IMO (needs filing a bug?).

Line breaking rules are described in UAX #14 by Unicode, and need no additional character run properties. Specific Unicode codepoints are already assigned their relevant classes.

Mike Kaganski gravatar imageMike Kaganski ( 2017-12-10 10:30:53 +0100 )edit

The usage of ZERO WIDTH NO-BREAK SPACE U+FEFF adds invisible but countable characters. CharKeepTogether, if implemented, would not. The missing "optional" CharacterProperty isn't completely obsolete therefore. The difference may be an issue in rare cases in Calc e.g.

Lupp gravatar imageLupp ( 2017-12-10 11:14:21 +0100 )edit

It is. It would add yet another duplicating implementation for the same feature, with one distinctive property (non-countability), and a huge potential for confusion ("I have cheared character properties, but it still doesn't break here!"). One can invent these duplicating entities ad nauseum, with ~no potential profit, bloating interface and increasing support cost and regression probability exponentially.

Mike Kaganski gravatar imageMike Kaganski ( 2017-12-10 12:03:51 +0100 )edit

Big difference, imo, in usability/reliability vs "misusability" between an implemented CharKeepTogether and the "alternative" you joked about.

Lupp gravatar imageLupp ( 2017-12-10 12:27:45 +0100 )edit

I'm not sure I understand you correctly (I suppose that you are telling that you'd prefer CharKeepTogether to the U+FEFF), but if I do: you cannot skip implementing Unicode standard, so however "superior" another alternative were, it is only an alternative.

Mike Kaganski gravatar imageMike Kaganski ( 2017-12-10 13:00:25 +0100 )edit

I was telling that I assumed a proposed property like CharNotCountable to be a joke.
On the other hand the insertion of (at least two) U+FEFF to keep a dash-connected term together, e.g. (this was asked here a while ago) is unhandy and not a true alternative to CharKeepTogether for a TextPiece. As a means to control the line wrapping in specific cases it is rather a workaround than a solution. I don't see multiple implementations of the same feature here.

Lupp gravatar imageLupp ( 2017-12-10 13:40:40 +0100 )edit

And somehow kidding again: See also this page.

Lupp gravatar imageLupp ( 2017-12-10 13:42:11 +0100 )edit

On the other hand the insertion of (at least two) U+FEFF to keep a dash-connected term together, e.g. ... is unhandy and not a true alternative to CharKeepTogether for a TextPiece

NON-BREAKING HYPHEN (U+2011) is a single proper character for this. And using both usual hyphen or a dash in this case is improper, so CharKeepTogether is not a true alternative and rather a workaround.

Mike Kaganski gravatar imageMike Kaganski ( 2017-12-10 15:05:42 +0100 )edit

See (my attempted answer there). Can there be an acceptable reason to want 12 or event 13 characters glued together?
I will no longer insist concerning this marginal issue.

Lupp gravatar imageLupp ( 2017-12-10 16:49:42 +0100 )edit

After reading the first comment from @Mike Kaganski, I decided to file a bug at

Jim K gravatar imageJim K ( 2017-12-12 00:35:01 +0100 )edit

Question Tools

1 follower


Asked: 2017-10-09 05:07:23 +0100

Seen: 395 times

Last updated: Dec 10 '17