My memory is failing me, but I read somewhere (Bugzilla? new release feature page?) that this new use of
span elements was meant to facilitate interoperability with OOXML: indeed, in Microsoft’s format, all text portions of a paragraph are part of something called a text run, even if no special formatting is applied to them (one of the many reasons why this format is awful). It might be related to change tracking. In any case, I don’t think this is a good reason to abuse
My memory is failing me, but I read somewhere (Bugzilla? new release feature page?) that this new use of
I agree with all your comments and I too find this matter of
<text:span> elements troubling. ISO/IEC 29500-1:2012(E) §188.8.131.52 on p.293 outlines the
<w:r> (Text Run) element. I can understand that
<text:span> is used as a way of mapping to this element to cater for interoperability, but the behaviour of adding to existing text should not necessitate this. It should be checked whether the element is required. I cannot find a related bug or information on this change from v3.x to v4.x.
Update: I found a related LO User ML thread which points to a related bug: fdo#68183. Unfortunately the answer appears to be related to revision tracking (the
officeooo:paragraph-rsid property, which is what @CyanCG suggested i.e., it is a OOXML compatibility feature). In my examples above I did not have revision tracking turned on, so I would think this unnecessary.
This is the bug I had in mind. The comment by Holger Schmithüsen nails it pretty well. Who do we need to convince to see this bug adressed? This
officeooo:rsid attribute is a hack and is absolutely unnecessary for those who actually use the OpenDocument format because of its specific virtues. I think that’s how the issue should be presented. OOXML compatibility should never have negative side-effects for those who choose ODF.
Well, I did a bit more research and it appears I was wrong in my initial conjecture that this was an OOXML-related change. The change relates to the feature for comparing documents. I have updated my answer to be clearer about this. Bug fdo#52028 provides the details. This is still little comfort if you would like the underlying XML to be cleaner. All I can suggest is raising a bug to address this, but unfortunately you will need to be incredibly specific in your detail of the problem.
Good, at least this appears to be a better reason for introducing those
rsid’s. I might eventually raise a bug and describe the rationale, the use cases, the best practices etc. Maybe I’ll ask for advice on TeX.SE first by asking a question along the lines of “is it advisable to define
\newcommands for marking up subsequent additions and revisions to a document?”. That would give us some food for thought!
I am using LibreOffice 4.1 on OS X 10.8.
It is indeed possible to apply more than one character style to a given portion of text. Take the following example with the two styles you mention (Source Text and Emphasis):
<text:p text:style-name="P1">This is what <text:span text:style-name="T1">emphasized </text:span> <text:span text:style-name="Source_20_Text"> <text:span text:style-name="Emphasis">code</text:span></text:span> looks like.</text:p>
In this first example, I have added the string
emphasized afterwards, so that it is enclosed in its own span element with an automatic text style
T1. Reminder: styles that apply to text strings are called character styles in LibO and text styles in the ODF spec. The
T1 style is defined thus:
<style:style style:name="T1" style:family="text"> <style:text-properties officeooo:rsid="000c8b60"/> </style:style>
The only defined attribute is an
officeooo:rsid, which shows that this style’s only purpose is document comparison (this makes me grumpy). Apart from that, we can see that it is quite possible to apply two character styles to the same portion. In fact, there are two ways to say it:
- LibO speak: It is possible to apply multiple different character styles to one text portion;
- Spec speak: It is possible to enclose a given text node in multiple nested
spanelements with different
Remark on LibO’s behaviour: the effects are cumulative, so that the word
code in this example is displayed in monospaced oblique type (for a monospaced font, the proper term is oblique, as there is no italic shape to speak of, but of course LibO does what is expected and chooses the oblique font).
Remark on implementation: according to the ODF 1.2 spec (part 1, section 19.770), a
text:class-names attribute exists for the purpose of applying more than one text style to a node:
text:class-namesattribute specifies a white space separated list of style names. The referenced styles are applied in the order they are contained in the list.
text:class-namesare present, the style referenced by the
text:style-nameattribute is applied before the styles referenced by
text:class-namesattribute. If a conditional style is specified together with a
text:class-namesattribute, but without a
text:style-nameattribute is assumed to have the value of the first style name in the list defined by the
I find that very useful and desirable. One potential use case is to apply a style that defines the visual presentation of the element with a
text:style-name attribute (say,
"Emphasis" to denote a change in tone with italic type) and also apply other styles with a purely semantic meaning and without any particular visual distinction, with a
text:class-names attribute (e.g.
"Archaic Eponym"). Perhaps that sounds convoluted, but it’s the kind of thing I have to do in my work and I am sure I am not alone.
Unfortunately, in my experience, LibO does not write
text:class-names attributes in the ODF it produces. Perhaps I just haven’t experienced enough, though.
Let us try with user-defined styles now. Here is the second paragraph in my document:
<text:p text:style-name="P2">I like <text:span text:style-name="Plant"> <text:span text:style-name="Drink">tea</text:span></text:span> very much.</text:p>
I have defined two character styles in LibO for this document: Plant and Drink. Plant makes the text green and Drink makes it italicized. I have applied them both to the word
tea because tea is the name for both the plant and the drink made from the plant ;-). Again, the effects are cumulative: the word
tea in my document is now both green and italicized, but again we have two
span elements instead of one element with both a
text:style-name and a
text:class-names attribute. Note that I applied Plant first and then Drink and, accordingly, the
Plant encloses the one with
span with the
Plant attribute comes first in the document’s hierarchy).
Remark on the paragraph element in this second example: it has a different style (
P2), even though the first and the second paragraph have the same level, the same appearance and the same purpose. Makes me grumpy. For people who want to produce structured and semantic documents, it is a pain. But this will change again in a future LibreOffice version, won’t it?
Now, let us try to understand how LibreOffice represents this document to itself.
A distinction that must be made clearly here is that, much like most applications that consume some kind of XML, LibreOffice does not use XML data structures “natively”. It is still true that ODF is LibO’s native format in a certain sense, but any application that supports many document formats needs some kind of internal representation. In our case, we could call it a StarWriter document object or something like that (but that might summon some old ghosts, so I’m not going to insist on that).
In order to give a full account, I would have to define concepts which are specific to OpenOffice/LibreOffice, but this is beyond my means. Suffice it to say that in the application, the text document contains objects that have properties and methods and that the objects that “wrap” others together are called services. The OpenOffice Developer’s Guide (a resource of very uneven quality) has a page that distinguishes further in Appendix A. To put it relatively simply, the structure of text paragraphs is so:
Paragraphs are services that include a
- The actual text is stored in
TextPortionservices. The description of the
TextPortionservice in the API documentation is very unclear about the actual status of the service and its relation to the
- In any case, the
TextPortionis not the thing that has a style: the
TextPortionservice includes a
TextRangeservice. This service is the one that has the actual
- In the
CharacterPropertiesservice, there are two properties that are relevant to our inquiry:
CharStyleNames. Both are optional. The data type of the
CharStyleNameproperty is “string”, while the data type of the
CharStyleNamesproperty is a sequence of strings.
CharStyleName property should correspond to the
text:style-name ODF attribute and the
CharStyleNames property should correspond to
text:class-names. This is more or less the case, except that the description of the
CharStylesnames property says:
It is not guaranteed that the order in the sequence reflects the order of the evaluation of the character style attributes.
If the ODF spec says that “[t]he referenced styles are applied in the order they are contained in the list”, then why is that? I have no answer…
However, now we know what properties we must inquire about. Here is a very dumb subroutine that only works in a very specific situation. When selecting a single styled word in the document, the subroutine prints the values of the style properties. Actually, it generates an error if the selection only has one style applied. The only purpose of this lousy code is to print the styles of a portion which we know has two styles:
sub print_char_styles_of_viewcursor() dim CurrentDoc as object dim FirstStyleofSelection as string dim StyleListofSelection(2) as string CurrentDoc = ThisComponent.CurrentController FirstStyleofSelection = CurrentDoc.getViewCursor().CharStyleName StyleListofSelection() = CurrentDoc.getViewCursor().CharStyleNames() print "The values of the selected portion’s character style properties are: “"; _ FirstStyleofSelection; "” for CharStyleName; “"; _ StyleListofSelection(0); ", "; StyleListofSelection(1); "” for CharStyleNames." end sub
Remarks on the code:
- Even though I said that character styles actually apply to text ranges, here I am using StarBasic to access properties, I am working on portions and I do get the character styles.
- I mentioned that in the API (IDL reference), the
CharStyleNamesproperty is defined as a sequence of strings. Here, from the perspective of StarBasic, it is an array of strings. My assignment of
StyleListofSelectionto the value of
CurrentDoc.getViewCursor().CharStyleNames()is basically the assignment of an array to another array. When you do this in real life, you need to be careful.
When I select the word
tea and then run this subroutine, the following message is printed:
The values of the selected portion’s character style properties are: “Drink” for CharStyleName; “Plant, Drink” for CharStyleNames.
When I select the word
code, this message is printed:
The values of the selected portion’s character style properties are: “Emphasis” for CharStyleName; “Source Text, Emphasis” for CharStyleNames.
In both cases, the value of the
CharStyleName property is the style that was applied last, but the order of the styles in
CharStyleNames corresponds to the order in which they were applied. This suggests that the order is indeed not respected when it comes to deciding which style has priority.
This is all very interesting, but the question remains: how do we accurately control the assignment of character properties? When writing the ODF, LibO does respect the order of assignment, but this is not reflected in the document’s data structures inside the application. I think a bug should be raised about the specific issue of the character styles’ priority. If the
CharStyleNames property was mapped to the
text:class-names attribute, that would be a big improvement.
@CyanCG: Congratulation for the depth of this answer
- Character styles are cumulative (“or” in my wording): then how do we remove an attribute, like contour? (I don’t take bold since font family may come with several bold values, e.g. Univers). Re: your remark about conditional emphasis.
- My usage of character style is close to semantic markup. I experiment afterwards with visual attributes in the style definition until the distinctions are “visible” (simultaneously trying to keep the traditional typographic usages).
- Ergonomics: from a user point of view, there should be some highlighting in the style navigator to show which character styles are active in the selection, not the last (?) one only (the StarBasic hack is not a solution). Presently, to be sure, I always first reset everything to Default before applying new styles.
- Ergonomics bis: I try to avoid whenever possible double markup, I think it is better to have a single style with a meaningful (and possible mnemotechnic) name. Drawback: many styles are created.
I agree, and my solution for now is also to apply “Default” and then re-apply the style I really need. If a single attribute is removed with direct formatting (an automatic style in the spec terminology) then it takes precedence over any applied style (be it user-defined or application-defined). That complicates things further.
Terrific analysis with which I agree. A few small things (all near the beginning): “the string emphasis afterwards” should read “the string ‘emphasized’ afterwards”; near the bullet points perhaps “multiple character styles” rather than “two character styles” (twice); rather than “slanted” I think the correct term is “oblique”. The commentary about reverting to Default formatting and direct format overrides I also agree with (in despair). I too cannot obtain
text:class-names from testing.
Answer amended. Slanted is also in common usage, but the article on Wikipedia suggests that oblique is indeed the preferred term :-).
From the user’s point of view this introduces some massive problems.
I like using LibreOffice for writing my books. In most respects it’s great.
However, one area which causes me big problems is producing the variant editions of a book.
To be specific, I have an A-format edition which uses 9pt text for the Chapter Body paragraph style; the same para style uses 10.5pt text for the B-format edition. Similarly for other para styles, like Chapter Heading.
However, when I copy the body of the MS from one file into the other (e.g. to create the A-format from the B-format) to create the other edition, it seems a random set of paragraphs fails to take the font size from the para style of the target
document. In addition, a small but (to the user) random amount of text is copied but loses the italic style.
Coupled with bugs in finding italic text, and bugs in comparing documents, the underlying problem of unexpected changes to the copied text’s format (font size, italics), is very difficult and time-consuming to fix.
I just thought I’d make a note of the issue here while I now go and look to see if there’s a bug report.
I initially came here from https://bugs.documentfoundation.org/show_bug.cgi?id=122215 in the hope of discovering the recipe that would avoid at least the problem of the italic property being lost on text apparently randomly. I assume it’s a problem that sometimes appear when using direct formatting.
I’m unclear what direct formatting is, but I have picked up hints that it’s bad and causes troubles.
But I don’t know what the preferred method of formatting is that avoids the troubles.
Since I always apply emphasis the same way (via selecting the text to be emphasised and then using Ctrl-I), I don’t understand why 90%+ of the text keeps the italic attribute, but some text doesn’t.
I want to be a good user, using paragraph and page and character styles correctly to support my workflows, but I haven’t found a solution to this problem yet.
As pointed out in the bug report and elsewhere, direct formatting is the usual cause of the problem. Direct formatting is any action aimed at changing text attributes without styles, such as keyboard shortcuts (for italics, bold, …) or toolbar buttons (same + lists, …). These seem “natural” because M$ Word does it this way, having no character style.
Direct formatting is “sticky” and “invisible”: in the layered styles model, it sits on top and has no hint in the various style panels and menus. It survives copy from area to area or document to document. The only way to get rid of it is to select a wide range of text and
Clear Direct Formatting.
I have a similar need to yours (though not exactly the same). My solution is to avoid direct formatting and exclusively use styles (para, character, page and list). This leans a very strict discipline while writing and probably some discomfort because of character style application instead of keyboard shortcuts…
(continued 1) unless you reonfigure in depth LO Writer to transfer the usual
B to your character styles.
However, you cannot fully forbid direct formatting because some actions have no style equivalent, e.g. resetting list numbering. These events are rare enough in my documents I accept the risk of living with it.
My workflow is to consider I put a (semantic) markup of the text with styles, i.e; I don’t request bold or italics but I mark a sequence as “important” or “outstanding” (Emphasis or Strong emphasis are two candidate built-in styles). Then, afterwards (in fact rather beforehand when I designed my template), I decide whether such marked sequences should display bold or red. My goal is to separate the contents and its semantics from the appearance or presentation. This imposes many constraints but eliminates problems when reviewing or preparing for another output medium.
In this respect, what I miss most is …
(continued 2) … the possibility to mark a sequence with more than one character style (as can be done in Quark XPress®), e.g. a sequence may be marked up as “comment” and I want to put “emphasis” on a word without losing the “comment” markup. Presently I solve the issue with another character style merging both original ones (not satisfactory).
Similarly, I’d like to be able to negate an attribute: Emphasis is usually coded “bold”, but if base style of paragraph style is already “bold”, typographic rules say this emphasis should revert to “Roman”. Can’t be done today in Writer apart from creating a “complementary” style.
Despite these shortcomings, I haven’t experienced the random non-updates, probably because I struggle to avoid direct formatting. As I wrote, it is not very user-friendly while typing but it is rewarding on editing.
That’s a very helpful answer, thank you.
It’s depressing however, as it means there’s a severe and ongoing usability problem, as well as a subtle and largely invisible trap for most users.
If the bug in being able to find text by attribute were fixed (Find & Replace, using Format, makes F&R unreliable), then a workaround might be to Find All italics, then simply choose one’s Character Emphasis style and apply that.
Does that sound practical?
Or does Writer’s “layered attribute” (?) model of text mean that finding text by the visible attribute the user is able to detect, can never be reliable?
I use sparingly Find & Replace probably due to my careful use of styles. I just looked to the F&R dialog to refresh my memory. The
Format button opens a font selection dialog. When you choose Italic in there, you’re in fact telling LO Writer to look for a font variant. If you applied your italics with direct formatting, i.e.
I, toolbar button or munu equivalent, I’m note sure of what gets recorded in the XML or internal representation.
Styling with a style involving a teal italic font should not create problem et be relatively reliable. Direct formatting works even for font without the specific variant (when rendering, the font engine “manufactures” an italic or bold synthetic version of the font). The XML encoding is probably not the same, meaning the search strategy doesn’t consider the same “keys” as the previous case.
The erratic behaviour you encounter may be due to a change of strategy in the middle of a search …
(continued) when F&R sees a direct formatting. Then, it does not revert to the initial “pure” strategy and begins to get confused. That pure speculation of mine.
Maybe the layered style architecture is also a factor. That’s why I keep away from direct formatting to get one layer out of the game. Consider direct formatting is OK for experimenting but should never be used for production-quality documents.
Thanks. When the italic font exists for the regular font you’re using, I think it’s reasonable to expect that searching for the italic font will find occurrences of text that Writer shows (via text display and the Font toolbar), to be that italic font. I think that expectation is reasonable regardless of whether the italic font text was produced via a style or via direct formatting.
Given other bugs, I believe Writer does not properly “understand” or operate on its own representation. Hopefully these issues can be addressed. I can do my part by providing one or two more bug reports to help.
I understand what you’re saying about the problems of direct formatting, caused IMHO by Writer’s model and UI and documentation. I doubt I could convince the devs to redesign the model, but hopefully they can fix the bugs in its implementation, and I can do my small part to help with the documentation where I see it lacking, too.
Thanks for your ongoing comments and explanations - it helps!