Writer: clarification needed about character attributes

ajlittoz · August 27, 2013, 12:30pm

When using character styles, it is not clear whether the attributes changed by the style override or augment those of the paragraph style.

It is even more unclear if several character styles can be simultaneously applied on the same run of characters.

Take, for instance, “Source Text” which forces a monospaced font. If you subsequently apply “Emphasis”, you end up with monospaced italic. On the contrary, if you apply a full user-defined style, then another user-defined style (with non conflicting attributes), the second style replaces the first one.

The (minor) consequence of this is: you’d better apply “Default” style first to be sure to have only the effects defined in the second style.

In addition to this forcing/augmenting dilemma, there is no notion of toggling attribute.

Say you want to emphasise a sequence with “Outline” (I know it is ugly, but this is for a simple example).
You can easily define a style forcing outline character shape.
Now, your paragraph uses outline by default and you want to emphasise a single word by negating “Outline” style.
In my LO experience, I do that only by defining a new style “Removed Outline” where I force the absence of the effect.
I end up with 2 paragraph styles and I must care to explicitly use the right one.
Moreover, if I change my mind afterwards with style “Outline”, I must manually replace all “Removed Outline” style with “Outline”.

The situation is even more complex with basic stylistic variations like italic or bold.

These variations are in fact different typefaces in the same font family.
Consequently, “toggling” is meaningless.
Setting or unsetting italic is equivalent to chosing a different font.

Coming back to the previous example, how to toggle italics in an italic-default paragraph without defining 2 context-dependent character-styles?

To sum up my concern, attributes can be forcing (0 ou 1), toggling (logical xor) or augmenting (logical or).

What is the underlying model in LO?
Is it consistent?

CyanCG · August 28, 2013, 9:36pm

Your question is very important and I have been investigating such issues for some time. I think that ideally, what we’d like is a system that behaves consistently and intelligently the way a system like LaTeX does (I’m pretty partial about LaTeX, but I strongly believe in LibO’s potential too ). For example, how can we get a conditional emphasis character style dependent on the underlying paragraph style?

CyanCG · August 28, 2013, 9:37pm

I’ll try to come up with a decent answer to your question in a few hours, stay tuned. This is a good opportunity to write a small summary of “my understanding so far”.

oweng · August 29, 2013, 2:12am

I am sure @CyanCG will provide a good answer here, but what I am commenting on is (IMO) one related aspect of why this character vs paragraph style override (OR / XOR) exists. The v3.5 series of LO did not abuse usage of the <text:span> element to the same extent as the v4.x series do. The ODF specification v1.2, with respect to this element, states:

6.1.7 <text:span>

The <text:span> element represents the application of a style to the character data of a portion of text. The content of this element is the text which uses that text style. The <text:span> element can be nested.

Here is a brief example of how use of this element has become problematic. Under GNU/Linux running TDF/LO v4.1.0.4 these steps produce the indicated underlying XML (double quotation marks are not entered, but are merely indicative of text to be entered):

Open Writer.
Enter “Here is some text.”
Save as a1.odt and exit Writer.
XML shows:

<text:p text:style-name="P1">Here is some text.</text:p>
Re-open a1.odt.
At the end of the previous text enter " It is now set to Text Body paragraph style."
With the cursor still at the end of the text double click on the Text Body paragraph style.
Save as a2.odt and exit Writer.
XML shows:

<text:p text:style-name="Text_20_body">Here is some text.
<text:span text:style-name="T1">It is now set to Text Body paragraph style.</text:span>
</text:p>
Re-open a2.odt.
At the end of the previous text enter " Same paragraph, but now I am going to go back and italicise the name of the style."
Highlight “Text Body” and set it to use the Emphasis character style.
Click at the end of the text and continue typing " I used the character style ‘Emphasis’ to do so and continued typing here afterwards."
Save as a3.odt and exit Writer.
XML shows:

`<text:p text:style-name="Text_20_body">Here is some text. ` 
	`<text:span text:style-name="T1">It is now set to </text:span>`
	`<text:span text:style-name="Emphasis">`
		`<text:span text:style-name="T1">Text Body</text:span>`
	`</text:span>`
	`<text:span text:style-name="T1"> paragraph style. Same paragraph, but now I am going to go back and italicise the name of the style. I used the character style ‘Emphasis’ to do so and continued typing here afterwards.</text:span>`
`</text:p>`

Now compare performing the same task under GNU/Linux running v3.5.7.2:

XML from first file reveals:

<text:p text:style-name="Standard">Here is some text.</text:p>
XML from second file reveals:

<text:p text:style-name="Text_20_body">Here is some text. It is now set to Text Body paragraph style.</text:p>
XML from third file reveals:

<text:p text:style-name="Text_20_body">Here is some text. It is now set to
<text:span text:style-name="Emphasis">Text Body</text:span>
paragraph style. Same paragraph, but now I am going to go back and italicise the name of the style. I used the character style ‘Emphasis’ to do so and continued typing here afterwards.</text:p>

IMO the v3.5.7.2 XML is significantly cleaner and easier to read. With the v4.x series, any basic test of inserting “aaa aaa” and then adding subsequent text “bbb” to existing words / paragraph triggers the unneccesary use of <text:span> elements. Within a few edits the underlying XML can become quite messy and difficult to read.

The application of character styles to such a mess of underlying XML will always be challenging. For this reason, if no other, it is imperative that the simplicity in structure of the underlying XML be a priority.

NOTE: Apologies for the block code formatting in this answer. It appears the Askbot upgrade in July 2013 has b0rked this aspect for XML. When this gets fixed I will come back and amend the examples.

EDIT: The comments below indicate that this change in behaviour may have been done for reasons of compatibility with OOXML, but this does not appear to be correct. Specifically, it relates to a change made (late 2011) to improve document comparisons i.e., the Edit > Compare Document… facility, which makes use of the officeooo:paragraph-rsid property. While bug fdo#68183 provides a recent XML example, these new properties were noted back in fdo#45448 and fdo#52028 in particular. That last bug report provides a very comprehensive example and comment #17 clearly indicates the rationale behind the change. The ensuing comments in that bug indicate some initial problems this change introduced for kerning and the related fixes.

I still have reservations about the implications this change holds for ODF v1.3 (which will presumably include this new property that is unspecified in v1.2). I am however now less concerned overall, although I feel it would be good if the manner in which the span elements were handled was less intrusive to the underlying XML.

ajlittoz · August 29, 2013, 6:36am

I do agree that simplicity in XML structure eases things and facilitates later updates. How does such a regression (personal opinion, no offense intended) happen? Some feature addition?

I don’t understand the translation of a2.odt editing: a1 paragraph was “Standard”. Adding text at the end uses the styles active at that location (in my understanding). Setting “Text Body” should style the whole paragraph content unless ‘Revision tracking’ is enabled by default (or at least some hidden feature gives the possibility to regenerate the revision history).

I already noticed user-visible differences between 3.x and 4.x. This fundamental one makes me hesitate to switch to 4.x for production.

CyanCG · August 29, 2013, 2:01pm

@oweng, this sums up the XML aspect very well. I’ll try to account for what happens inside the application itself, i.e. in LibO’s internal data structures (at least, the part of them that I think I understand). The abuse of span elements with automatic styles in 4.0 and 4.1 troubles me very much, among other reasons because it makes conversion to other XML formats (XHTML) and TeX formats (LaTeX, ConTeXt) much less clean and less semantic.

CyanCG · August 29, 2013, 2:05pm

My memory is failing me, but I read somewhere (Bugzilla? new release feature page?) that this new use of span elements was meant to facilitate interoperability with OOXML: indeed, in Microsoft’s format, all text portions of a paragraph are part of something called a text run, even if no special formatting is applied to them (one of the many reasons why this format is awful). It might be related to change tracking. In any case, I don’t think this is a good reason to abuse span’s.

oweng · August 29, 2013, 11:40pm

I agree with all your comments and I too find this matter of <text:span> elements troubling. ISO/IEC 29500-1:2012(E) §17.3.2.25 on p.293 outlines the <w:r> (Text Run) element. I can understand that <text:span> is used as a way of mapping to this element to cater for interoperability, but the behaviour of adding to existing text should not necessitate this. It should be checked whether the element is required. I cannot find a related bug or information on this change from v3.x to v4.x.

oweng · August 29, 2013, 11:53pm

Update: I found a related LO User ML thread which points to a related bug: fdo#68183. Unfortunately the answer appears to be related to revision tracking (the officeooo:paragraph-rsid property, which is what @CyanCG suggested i.e., it is a OOXML compatibility feature). In my examples above I did not have revision tracking turned on, so I would think this unnecessary.

CyanCG · August 30, 2013, 7:39pm

This is the bug I had in mind. The comment by Holger Schmithüsen nails it pretty well. Who do we need to convince to see this bug adressed? This officeooo:rsid attribute is a hack and is absolutely unnecessary for those who actually use the OpenDocument format because of its specific virtues. I think that’s how the issue should be presented. OOXML compatibility should never have negative side-effects for those who choose ODF.

oweng · August 31, 2013, 12:10am

Well, I did a bit more research and it appears I was wrong in my initial conjecture that this was an OOXML-related change. The change relates to the feature for comparing documents. I have updated my answer to be clearer about this. Bug fdo#52028 provides the details. This is still little comfort if you would like the underlying XML to be cleaner. All I can suggest is raising a bug to address this, but unfortunately you will need to be incredibly specific in your detail of the problem.

CyanCG · August 31, 2013, 1:22pm

Good, at least this appears to be a better reason for introducing those rsid’s. I might eventually raise a bug and describe the rationale, the use cases, the best practices etc. Maybe I’ll ask for advice on TeX.SE first by asking a question along the lines of “is it advisable to define \newcommands for marking up subsequent additions and revisions to a document?”. That would give us some food for thought!

CyanCG · August 31, 2013, 2:12pm

Applying more than one text (character) style to a portion

I am using LibreOffice 4.1 on OS X 10.8.

It is indeed possible to apply more than one character style to a given portion of text. Take the following example with the two styles you mention (Source Text and Emphasis):

<text:p text:style-name="P1">This is what
<text:span text:style-name="T1">emphasized </text:span>
<text:span text:style-name="Source_20_Text">
<text:span text:style-name="Emphasis">code</text:span></text:span>
looks like.</text:p>

In this first example, I have added the string emphasized afterwards, so that it is enclosed in its own span element with an automatic text style T1. Reminder: styles that apply to text strings are called character styles in LibO and text styles in the ODF spec. The T1 style is defined thus:

<style:style style:name="T1" style:family="text">
  <style:text-properties officeooo:rsid="000c8b60"/>
</style:style>

The only defined attribute is an officeooo:rsid, which shows that this style’s only purpose is document comparison (this makes me grumpy). Apart from that, we can see that it is quite possible to apply two character styles to the same portion. In fact, there are two ways to say it:

LibO speak: It is possible to apply multiple different character styles to one text portion;
Spec speak: It is possible to enclose a given text node in multiple nested span elements with different text:style-name attributes.

Remark on LibO’s behaviour: the effects are cumulative, so that the word code in this example is displayed in monospaced oblique type (for a monospaced font, the proper term is oblique, as there is no italic shape to speak of, but of course LibO does what is expected and chooses the oblique font).

Remark on implementation: according to the ODF 1.2 spec (part 1, section 19.770), a text:class-names attribute exists for the purpose of applying more than one text style to a node:

A text:class-names attribute specifies a white space separated list of style names. The referenced styles are applied in the order they are contained in the list.

If both text:style-name and text:class-names are present, the style referenced by the text:style-name attribute is applied before the styles referenced by text:class-names attribute. If a conditional style is specified together with a text:class-names attribute, but without a text:style-name attribute, the text:style-name attribute is assumed to have the value of the first style name in the list defined by the text:class-name attribute.

I find that very useful and desirable. One potential use case is to apply a style that defines the visual presentation of the element with a text:style-name attribute (say, "Emphasis" to denote a change in tone with italic type) and also apply other styles with a purely semantic meaning and without any particular visual distinction, with a text:class-names attribute (e.g. "Archaic Eponym"). Perhaps that sounds convoluted, but it’s the kind of thing I have to do in my work and I am sure I am not alone.

Unfortunately, in my experience, LibO does not write text:class-names attributes in the ODF it produces. Perhaps I just haven’t experienced enough, though.

Let us try with user-defined styles now. Here is the second paragraph in my document:

<text:p text:style-name="P2">I like
<text:span text:style-name="Plant">
<text:span text:style-name="Drink">tea</text:span></text:span>
very much.</text:p>

I have defined two character styles in LibO for this document: Plant and Drink. Plant makes the text green and Drink makes it italicized. I have applied them both to the word tea because tea is the name for both the plant and the drink made from the plant ;-). Again, the effects are cumulative: the word tea in my document is now both green and italicized, but again we have two span elements instead of one element with both a text:style-name and a text:class-names attribute. Note that I applied Plant first and then Drink and, accordingly, the span with Plant encloses the one with Drink (the span with the Plant attribute comes first in the document’s hierarchy).

Remark on the paragraph element in this second example: it has a different style (P2), even though the first and the second paragraph have the same level, the same appearance and the same purpose. Makes me grumpy. For people who want to produce structured and semantic documents, it is a pain. But this will change again in a future LibreOffice version, won’t it?

Now, let us try to understand how LibreOffice represents this document to itself.

How do we inquire about the character styles applied to a text portion?

A distinction that must be made clearly here is that, much like most applications that consume some kind of XML, LibreOffice does not use XML data structures “natively”. It is still true that ODF is LibO’s native format in a certain sense, but any application that supports many document formats needs some kind of internal representation. In our case, we could call it a StarWriter document object or something like that (but that might summon some old ghosts, so I’m not going to insist on that).

In order to give a full account, I would have to define concepts which are specific to OpenOffice/LibreOffice, but this is beyond my means. Suffice it to say that in the application, the text document contains objects that have properties and methods and that the objects that “wrap” others together are called services. The OpenOffice Developer’s Guide (a resource of very uneven quality) has a page that distinguishes further in Appendix A. To put it relatively simply, the structure of text paragraphs is so:

Paragraphs are services that include a TextContent service.
The actual text is stored in TextPortion services. The description of the TextPortion service in the API documentation is very unclear about the actual status of the service and its relation to the Paragraph service.
In any case, the TextPortion is not the thing that has a style: the TextPortion service includes a TextRange service. This service is the one that has the actual CharacterProperties service.
In the CharacterProperties service, there are two properties that are relevant to our inquiry: CharStyleName and CharStyleNames. Both are optional. The data type of the CharStyleName property is “string”, while the data type of the CharStyleNames property is a sequence of strings.

Logically, the CharStyleName property should correspond to the text:style-name ODF attribute and the CharStyleNames property should correspond to text:class-names. This is more or less the case, except that the description of the CharStylesnames property says:

It is not guaranteed that the order in the sequence reflects the order of the evaluation of the character style attributes.

If the ODF spec says that “[t]he referenced styles are applied in the order they are contained in the list”, then why is that? I have no answer…

However, now we know what properties we must inquire about. Here is a very dumb subroutine that only works in a very specific situation. When selecting a single styled word in the document, the subroutine prints the values of the style properties. Actually, it generates an error if the selection only has one style applied. The only purpose of this lousy code is to print the styles of a portion which we know has two styles:

sub print_char_styles_of_viewcursor()

dim CurrentDoc as object
dim FirstStyleofSelection as string
dim StyleListofSelection(2) as string

CurrentDoc = ThisComponent.CurrentController
FirstStyleofSelection = CurrentDoc.getViewCursor().CharStyleName
StyleListofSelection() = CurrentDoc.getViewCursor().CharStyleNames()

print "The values of the selected portion’s character style properties are: “"; _
FirstStyleofSelection; "” for CharStyleName; “"; _
StyleListofSelection(0); ", "; StyleListofSelection(1); "” for CharStyleNames."

end sub

Remarks on the code:

Even though I said that character styles actually apply to text ranges, here I am using StarBasic to access properties, I am working on portions and I do get the character styles.
I mentioned that in the API (IDL reference), the CharStyleNames property is defined as a sequence of strings. Here, from the perspective of StarBasic, it is an array of strings. My assignment of StyleListofSelection to the value of CurrentDoc.getViewCursor().CharStyleNames() is basically the assignment of an array to another array. When you do this in real life, you need to be careful.

When I select the word tea and then run this subroutine, the following message is printed:

The values of the selected portion’s character style properties are: “Drink” for CharStyleName; “Plant, Drink” for CharStyleNames.

When I select the word code, this message is printed:

The values of the selected portion’s character style properties are: “Emphasis” for CharStyleName; “Source Text, Emphasis” for CharStyleNames.

In both cases, the value of the CharStyleName property is the style that was applied last, but the order of the styles in CharStyleNames corresponds to the order in which they were applied. This suggests that the order is indeed not respected when it comes to deciding which style has priority.

This is all very interesting, but the question remains: how do we accurately control the assignment of character properties? When writing the ODF, LibO does respect the order of assignment, but this is not reflected in the document’s data structures inside the application. I think a bug should be raised about the specific issue of the character styles’ priority. If the CharStyleNames property was mapped to the text:class-names attribute, that would be a big improvement.

ajlittoz · August 31, 2013, 4:08pm

@CyanCG: Congratulation for the depth of this answer

Character styles are cumulative (“or” in my wording): then how do we remove an attribute, like contour? (I don’t take bold since font family may come with several bold values, e.g. Univers). Re: your remark about conditional emphasis.
My usage of character style is close to semantic markup. I experiment afterwards with visual attributes in the style definition until the distinctions are “visible” (simultaneously trying to keep the traditional typographic usages).
Ergonomics: from a user point of view, there should be some highlighting in the style navigator to show which character styles are active in the selection, not the last (?) one only (the StarBasic hack is not a solution). Presently, to be sure, I always first reset everything to Default before applying new styles.
Ergonomics bis: I try to avoid whenever possible double markup, I think it is better to have a single style with a meaningful (and possible mnemotechnic) name. Drawback: many styles are created.

CyanCG · August 31, 2013, 9:12pm

I agree, and my solution for now is also to apply “Default” and then re-apply the style I really need. If a single attribute is removed with direct formatting (an automatic style in the spec terminology) then it takes precedence over any applied style (be it user-defined or application-defined). That complicates things further.

oweng · September 1, 2013, 12:59am

Terrific analysis with which I agree. A few small things (all near the beginning): “the string emphasis afterwards” should read “the string ‘emphasized’ afterwards”; near the bullet points perhaps “multiple character styles” rather than “two character styles” (twice); rather than “slanted” I think the correct term is “oblique”. The commentary about reverting to Default formatting and direct format overrides I also agree with (in despair). I too cannot obtain text:class-names from testing.

CyanCG · September 3, 2013, 2:59pm

Answer amended. Slanted is also in common usage, but the article on Wikipedia suggests that oblique is indeed the preferred term :-).

LukeKendall · May 9, 2019, 12:24pm

From the user’s point of view this introduces some massive problems.
I like using LibreOffice for writing my books. In most respects it’s great.
However, one area which causes me big problems is producing the variant editions of a book.
To be specific, I have an A-format edition which uses 9pt text for the Chapter Body paragraph style; the same para style uses 10.5pt text for the B-format edition. Similarly for other para styles, like Chapter Heading.
However, when I copy the body of the MS from one file into the other (e.g. to create the A-format from the B-format) to create the other edition, it seems a random set of paragraphs fails to take the font size from the para style of the target
document. In addition, a small but (to the user) random amount of text is copied but loses the italic style.
Coupled with bugs in finding italic text, and bugs in comparing documents, the underlying problem of unexpected changes to the copied text’s format (font size, italics), is very difficult and time-consuming to fix.
I just thought I’d make a note of the issue here while I now go and look to see if there’s a bug report.

I initially came here from https://bugs.documentfoundation.org/show_bug.cgi?id=122215 in the hope of discovering the recipe that would avoid at least the problem of the italic property being lost on text apparently randomly. I assume it’s a problem that sometimes appear when using direct formatting.
I’m unclear what direct formatting is, but I have picked up hints that it’s bad and causes troubles.
But I don’t know what the preferred method of formatting is that avoids the troubles.
Since I always apply emphasis the same way (via selecting the text to be emphasised and then using Ctrl-I), I don’t understand why 90%+ of the text keeps the italic attribute, but some text doesn’t.
I want to be a good user, using paragraph and page and character styles correctly to support my workflows, but I haven’t found a solution to this problem yet.

ajlittoz · May 9, 2019, 1:43pm

As pointed out in the bug report and elsewhere, direct formatting is the usual cause of the problem. Direct formatting is any action aimed at changing text attributes without styles, such as keyboard shortcuts (for italics, bold, …) or toolbar buttons (same + lists, …). These seem “natural” because M$ Word does it this way, having no character style.

Direct formatting is “sticky” and “invisible”: in the layered styles model, it sits on top and has no hint in the various style panels and menus. It survives copy from area to area or document to document. The only way to get rid of it is to select a wide range of text and Ctrl+M or Format>Clear Direct Formatting.

I have a similar need to yours (though not exactly the same). My solution is to avoid direct formatting and exclusively use styles (para, character, page and list). This leans a very strict discipline while writing and probably some discomfort because of character style application instead of keyboard shortcuts…

ajlittoz · May 9, 2019, 1:53pm

(continued 1) unless you reonfigure in depth LO Writer to transfer the usual Ctrl+I or B to your character styles.

However, you cannot fully forbid direct formatting because some actions have no style equivalent, e.g. resetting list numbering. These events are rare enough in my documents I accept the risk of living with it.

My workflow is to consider I put a (semantic) markup of the text with styles, i.e; I don’t request bold or italics but I mark a sequence as “important” or “outstanding” (Emphasis or Strong emphasis are two candidate built-in styles). Then, afterwards (in fact rather beforehand when I designed my template), I decide whether such marked sequences should display bold or red. My goal is to separate the contents and its semantics from the appearance or presentation. This imposes many constraints but eliminates problems when reviewing or preparing for another output medium.

In this respect, what I miss most is …