# How to detect non-breakable space when iterating document via UNO

I have following paragraph (replaced non-breakable space with % here for clarity):

NORMAL11111111111111111111111111111111111111111111111111111111111111111111
non%breakable normal2


I iterate TextPortions and can get all the other character properties like weight, posture, color etc.
TextPortions are:

• "NORMAL"
• "1"
• "1111111111111111111111111111111111111111111111111111111111111111111"
• " non breakable normal2"

The line of code print(portion.CharKeepTogether) gives me an error: "AttributeError: CharKeepTogether".
And indeed, when I inspect all the properties of the portion with unohelper.inspect(portion, sys.stdout) (it prints out all I can get through the object), there is no CharKeepTogether in the list.

I've tried this with LibreOffice 5.3.0 and 5.4.2.2

Any ideas ?

edit retag close merge delete

2
( 2017-12-09 21:43:14 +0200 )edit
1

@Alexander Malahov: Good question, although it took me awhile to understand that this was cross-posted to another site. A simple statement like "I also asked this at ..." would help.

( 2017-12-10 09:55:16 +0200 )edit
1

I was not the questioner there. Should have added the hint "Cross-posting:" instead of "See also:". Sorry!

( 2017-12-10 11:02:27 +0200 )edit

Sort by » oldest newest most voted

As explained at https://ask.libreoffice.org/en/questi..., apparently there is no longer a property named CharKeepTogether. Try the following code instead.

def count_nonbreaking_spaces():
oParEnum = XSCRIPTCONTEXT.getDocument().getText().createEnumeration()
count = 0
while oParEnum.hasMoreElements():
oPar = oParEnum.nextElement()
oPortionEnum = oPar.createEnumeration()
while oPortionEnum.hasMoreElements():
oTextPortion = oPortionEnum.nextElement()
if oTextPortion.TextPortionType == "Text":
s = oTextPortion.getString()
for c in s:
if c == "\xA0":
count += 1
print("Found %d nonbreaking spaces." % (count))


0xA0 is described at http://www.fileformat.info/info/unico...

more

2

+1

The "CharKeepTogether" attribute seems to have no associated code from the day 1 of OpenOffice.org. ODF has no such property for character. It is obsolete, and the proposal to remove it (by @Jim K in another topic) makes perfect sense IMO (needs filing a bug?).

Line breaking rules are described in UAX #14 by Unicode, and need no additional character run properties. Specific Unicode codepoints are already assigned their relevant classes.

( 2017-12-10 10:30:53 +0200 )edit

The usage of ZERO WIDTH NO-BREAK SPACE U+FEFF adds invisible but countable characters. CharKeepTogether, if implemented, would not. The missing "optional" CharacterProperty isn't completely obsolete therefore. The difference may be an issue in rare cases in Calc e.g.

( 2017-12-10 11:14:21 +0200 )edit

It is. It would add yet another duplicating implementation for the same feature, with one distinctive property (non-countability), and a huge potential for confusion ("I have cheared character properties, but it still doesn't break here!"). One can invent these duplicating entities ad nauseum, with ~no potential profit, bloating interface and increasing support cost and regression probability exponentially.

( 2017-12-10 12:03:51 +0200 )edit

Big difference, imo, in usability/reliability vs "misusability" between an implemented CharKeepTogether and the "alternative" you joked about.

( 2017-12-10 12:27:45 +0200 )edit

I'm not sure I understand you correctly (I suppose that you are telling that you'd prefer CharKeepTogether to the U+FEFF), but if I do: you cannot skip implementing Unicode standard, so however "superior" another alternative were, it is only an alternative.

( 2017-12-10 13:00:25 +0200 )edit

I was telling that I assumed a proposed property like CharNotCountable to be a joke.
On the other hand the insertion of (at least two) U+FEFF to keep a dash-connected term together, e.g. (this was asked here a while ago) is unhandy and not a true alternative to CharKeepTogether for a TextPiece. As a means to control the line wrapping in specific cases it is rather a workaround than a solution. I don't see multiple implementations of the same feature here.

( 2017-12-10 13:40:40 +0200 )edit

( 2017-12-10 13:42:11 +0200 )edit

On the other hand the insertion of (at least two) U+FEFF to keep a dash-connected term together, e.g. ... is unhandy and not a true alternative to CharKeepTogether for a TextPiece

NON-BREAKING HYPHEN (U+2011) is a single proper character for this. And using both usual hyphen or a dash in this case is improper, so CharKeepTogether is not a true alternative and rather a workaround.

( 2017-12-10 15:05:42 +0200 )edit

See https://ask.libreoffice.org/en/questi... (my attempted answer there). Can there be an acceptable reason to want 12 or event 13 characters glued together?
I will no longer insist concerning this marginal issue.

( 2017-12-10 16:49:42 +0200 )edit

After reading the first comment from @Mike Kaganski, I decided to file a bug at https://bugs.documentfoundation.org/s....

( 2017-12-12 00:35:01 +0200 )edit

## Stats

Asked: 2017-10-09 05:07:23 +0200

Seen: 211 times

Last updated: Dec 10 '17