How to detect non-breakable space when iterating document via UNO

I have following paragraph (replaced non-breakable space with % here for clarity):

NORMAL11111111111111111111111111111111111111111111111111111111111111111111
non%breakable normal2

I iterate TextPortions and can get all the other character properties like weight, posture, color etc.
TextPortions are:

  • “NORMAL”
  • “1”
  • “1111111111111111111111111111111111111111111111111111111111111111111”
  • " non breakable normal2"

The line of code print(portion.CharKeepTogether) gives me an error: “AttributeError: CharKeepTogether”.
And indeed, when I inspect all the properties of the portion with unohelper.inspect(portion, sys.stdout) (it prints out all I can get through the object), there is no CharKeepTogether in the list.

I’ve tried this with LibreOffice 5.3.0 and 5.4.2.2

Any ideas ?

See also: [Solved] How to detect non-breakable space in TextPortion (View topic) • Apache OpenOffice Community Forum

@AlexanderMalahov: Good question, although it took me awhile to understand that this was cross-posted to another site. A simple statement like “I also asked this at …” would help.

I was not the questioner there. Should have added the hint “Cross-posting:” instead of “See also:”. Sorry!

As explained at Optional character properties - #3 by jimk, apparently there is no longer a property named CharKeepTogether. Try the following code instead.

def count_nonbreaking_spaces():
    oParEnum = XSCRIPTCONTEXT.getDocument().getText().createEnumeration()
    count = 0
    while oParEnum.hasMoreElements():
        oPar = oParEnum.nextElement()
        oPortionEnum = oPar.createEnumeration()
        while oPortionEnum.hasMoreElements():
            oTextPortion = oPortionEnum.nextElement()
            if oTextPortion.TextPortionType == "Text":
                s = oTextPortion.getString()
                for c in s:
                    if c == "\xA0":
                        count += 1
    print("Found %d nonbreaking spaces." % (count))

0xA0 is described at http://www.fileformat.info/info/unicode/char/A0/index.htm

+1

The “CharKeepTogether” attribute seems to have no associated code from the day 1 of OpenOffice.org. ODF has no such property for character. It is obsolete, and the proposal to remove it (by @jimk in another topic) makes perfect sense IMO (needs filing a bug?).

Line breaking rules are described in UAX #14 by Unicode, and need no additional character run properties. Specific Unicode codepoints are already assigned their relevant classes.

The usage of ZERO WIDTH NO-BREAK SPACE U+FEFF adds invisible but countable characters. CharKeepTogether, if implemented, would not. The missing “optional” CharacterProperty isn’t completely obsolete therefore. The difference may be an issue in rare cases in Calc e.g.

It is. It would add yet another duplicating implementation for the same feature, with one distinctive property (non-countability), and a huge potential for confusion (“I have cheared character properties, but it still doesn’t break here!”). One can invent these duplicating entities ad nauseum, with ~no potential profit, bloating interface and increasing support cost and regression probability exponentially.

Big difference, imo, in usability/reliability vs “misusability” between an implemented CharKeepTogether and the “alternative” you joked about.

I’m not sure I understand you correctly (I suppose that you are telling that you’d prefer CharKeepTogether to the U+FEFF), but if I do: you cannot skip implementing Unicode standard, so however “superior” another alternative were, it is only an alternative.

I was telling that I assumed a proposed property like CharNotCountable to be a joke.
On the other hand the insertion of (at least two) U+FEFF to keep a dash-connected term together, e.g. (this was asked here a while ago) is unhandy and not a true alternative to CharKeepTogether for a TextPiece. As a means to control the line wrapping in specific cases it is rather a workaround than a solution. I don’t see multiple implementations of the same feature here.

And somehow kidding again: See also this page.

On the other hand the insertion of (at least two) U+FEFF to keep a dash-connected term together, e.g. … is unhandy and not a true alternative to CharKeepTogether for a TextPiece

NON-BREAKING HYPHEN (U+2011) is a single proper character for this. And using both usual hyphen or a dash in this case is improper, so CharKeepTogether is not a true alternative and rather a workaround.

See Non breaking en dash (my attempted answer there). Can there be an acceptable reason to want 12 or event 13 characters glued together?
I will no longer insist concerning this marginal issue.

After reading the first comment from @mikekaganski, I decided to file a bug at https://bugs.documentfoundation.org/show_bug.cgi?id=114416.