Attempt to Preserve .fodt Format Names

Hi all, I know this is a dumb question, but I’ve been editing a large fodt file that I’ve been committing to git. When I commit the file, I find that most of the time, LibreOffice has gone through and added a few to the numbers in the format names. Hence, when I commit to git, we end up with many unnecessarily changed lines. Is there any way for LibreOffice to attempt to preserve odt format names? Or we end with something like this every time: style increment

Not really competent in LO Writer internals, but from what I see base paragraph style is Standard. This sounds like you’re applying a lot of direct formatting. Writer is entitled to handle them on its own. E.g. you edit something and all subsequent direct formatting are renumbered.

You should always use ad-hoc paragraph and character styles, not speaking of page and numbering styles.

If this cures the problem, I transform this comment into an aswer.

A remark: I tried to edit a complex file in .fodt but this was not possible because some “lines” exceeded the capacity of the editor buffer (KWrite in my case). I understand your need to see what changed through git (.odt is a binary format seen from git and does not satisfy this need) but you can’t always succeed due the the unlimited line size in .fodt.

@ajlittoz: Enable pretty printing if you will work on an fodt file directly using an editor. Search for “pretty” in the Advanced > Expert Configuration. Double-click toggles between true and false.

@Regina: thanks for the tip

@ajilttoz, thanks for the help! While I couldn’t go through with LibreOffice and replace the direct formatting, I was able to do so by find-and-replace by opening the fodt on an XML editor.

@Regina: thanks for pointing that out, it is well hidden. Tools…Options…Advanced, then the “Open Expert Configuration” button is lower right (The rest of the page is about Java, fooled me), then in Expert Configuration the option is found by double-clicking the + before Org.Openoffice.Office.Common to open it, then again for Save, then for Document, then PrettyPrinting is about 2/3 of the way down that list.

@oliverb: The expert configuration has a search field at the top. Therefore I have written “search for ‘pretty’”.

I’ve had another look at the fodt file after eliminating “default style”, now I seem to have lots of “Pn” styles all derived from “text body”, it seems that writer just insists on numbered styles, is the only way to remove them going to be by by brute-forcing in a text editor?

Elsewhere it seems to do this for the most trivial reasons, I see anonymous styles derived from headings too, and there seems to be no merging of anonymous styles with a named style once they become redundant.

Okay, so I’ve got a solution, guys. Unfortunately, it would be too troublesome to go over the entire 75-page document with LibreOffice and change the formatting, but I have found a way nevertheless.

In the document, I mainly have a few styles: Two headings, a paragraph style, and a quote style. I have customized their respective styles to fit each one. However, I cannot go over the entire document manually, so I opened up the .fodt XML using a simple text editor and looked around.

Surprisingly, the .fodt styling was quite intuitive. The paragraph styling seems to be in the form “PX” (with the quotes) where X is a number, and the inline text styling is in the form “TX” (with the quotes), where X is a number. When going through the metadata for the inline text styling, I find that most of the styles are either dead or invisible styles. I could delete the dead styles without affecting any text. For the duplicates, I chose one of the styles, then replaced the content of all the other duplicates with a copy of that style. Then, I opened it up with LibreOffice and let LibreOffice merge the styles for me. This allowed me to keep all my italic text italic.

Paragraph styling was slightly more difficult, but not a problem. The process was similar, except that this time, I did a find and replace of all the duplicate “PX” with their corresponding defined style (like “Text_20_Body” for text body). No more direct paragraph formatting!

Now I’ve not only cut the size of my file in half, I’ve also made it much more efficient to commit to git. That’s killing two birds with one stone; thanks for your help!

EDIT: Another thing I found was that LibreOffice seemed to like creating new paragraph styles which only changed the field “officeooo:rsid”, greatly inflating the file sizes, especially if editing over two separate machines. I removed these by following these instructions: Where do the `text:style-name="Tnn"` span tags come from, and how do I get rid of them?

From other questions/answers office000:rsid seems to be related to the file comparison feature.

Using direct formatting internally creates “anonymous” styles. To make them unique, LO Writer gives them names of the form “Pnnn” where nnn is a running number. Similar names are given to character, page, numbering … direct formatting.

The consequence is occurrences of identical direct formatting have different names, inflating file size and also making impossible to change formatting from a single location (which is after all the goal of direct formatting).

This is yet another illustration of the intended usage workflow with LO Writer: styles, styles, styles again and only named styles. Styles give efficiency and most importantly a real comfort in document maintenance (editing, layout, formatting, …).

Designing an ad hoc set of styles may appear a boring and daunting task (though built-in styles constitute a good starting point) but it largely pays back. It adds another level of structure to the document.

Golden rule: to avoid many problems, refrain from using direct formatting, use it only in rare circumstances where it is tolerable (but sooner or later it will play trucks on your back); always use your customised styles.

Would it be possible to change the way anonymous styles were numbered? Maybe use style names that describe the applied formatting or use a hash derived from the text? Something that would generate the same result if applied twice to the same document.
I’m assuming that there is no “handle” retained in memory between loading and saving the document so just “not changing the number” isn’t practical?

I see elsewhere these are described as anonymous styles.
On examination of a document I produced some time ago in Word and tried resaving to fodt it appears as if each occurance of “Default Style” appears to require one of these numbered styles.
It appears as if every section of text in “Default Style” must be reformatted with a named style in order that the numbered styles are eradicated.

Don’t know for Word. But in LO Writer never use Default Style for any of your paragraphs. Default Style purpose is to set defaults (font, size, spacings, indents, etc.) for all other styles. I guess runs in Default Style are considered “direct formatting” and this causes creation of anonymous styles. “Normal” text should be formatted Text Body.

Also I found that most could be removed by using the LO replace tool, which can search and replace styles.
It made a mess of my line spacing though, as my “text body” style had a different spacing to “Default”.

That’s unavoidable because Default Style is the ancestor of all, plain text as well as headings, headers and indexes. Adjust Text Body and you’re done.

Not quite, the numbered styles are still there they just derive from text body now. Maybe it works on totally clean “txt” but it seems like anything with existing formatting is hard to clean?

The cause is “direct formatting” of some sort (even if you “apparently” reset manually the properties to default). The only way is to select the paragraph (or whole text, but you won’t see the individual differences) and Format>Clear Direct Formatting or Ctrl+M