Ask Your Question
0

Advice on replacing one document's content.xml with another?

asked 2020-05-18 15:12:18 +0200

Luke Kendall gravatar image

updated 2020-07-27 15:23:09 +0200

Alex Kemp gravatar image

I'd like to produce variant versions of my book from a single master version, where the main differences are just in the formatting. The simplest example is creating a 4"x7" edition by just unzipping a 4x7 template .odt and the book's master 5"x8" edition, copying the 5"x8" edition's content.xml file into the unzipped template, editing the ISBN and then re-zipping. I use the same name paragraph styles in all variants. To make the document amenable to diff tools like meld and human editing, I can break up the XML into something semantically identical (I believe!) by only folding lines after an end-text XML tag

</text...>

(I can also safely fold lines at any right angle-bracket before the XML for any <text> element starts.)

AN advice on what I should be alert to?

One thing I've noticed comparing my manually constructed (by cutting and pasting the whole master text into the 4x7 template and then fixing all the formatting errors), is that the two documents have

<text:soft-page-break/>

sprinkled through in the middle of the text of paragraphs for no reason I can imagine. I certainly didn't intend yo make any kind of page break in the middle of a paragraph. I also observe that the names of the XML tags used for paragraphs and text styles differ (e.g. P20 in one is P15 in the other, T5 is T3, and so on, and there are about ten times the number I expected).

I also see things I don't understand in the paragraph styles XML, like names that seem to reference page styles I have (like Body_N_HdrFtr for body pages with no header or footer), and entities like a paragraph-rsid.

<style:style style:name="P63" style:family="paragraph" style:parent-style-name="Text_20_body" style:master-page-
name="Body_5f_N_5f_HdrFtr">
<style:paragraph-properties style:page-number="auto"/>
<style:text-properties fo:language="en" fo:country="US"/>
</style:style>
<style:style style:name="P64" style:family="paragraph" style:parent-style-name="Text_20_body" style:master-page-name="Body_5f_N_5f_HdrFtr">
<style:paragraph-properties fo:text-align="center" style:justify-single-word="false" style:page-number="auto"/>
<style:text-properties fo:language="en" fo:country="US"/>
</style:style>
<style:style style:name="P65" style:family="paragraph">
<style:text-properties fo:language="en" fo:country="US"/>
</style:style>
<style:style style:name="P66" style:family="paragraph" style:parent-style-name="Chapter_20_Title" style:list-style-name="">
<style:paragraph-properties fo:text-align="center" style:justify-single-word="false" fo:hyphenation-ladder-count="no-limit">
<style:tab-stops/>
</style:paragraph-properties>
<style:text-properties fo:language="en" fo:country="US" fo:hyphenate="false" fo:hyphenation-remain-char-count="2" fo:hyphenation-push-char-count="2"/>
</style:style>
<style:style style:name="P67" style:family="paragraph" style:parent-style-name="Chapter_20_Title">
<style:paragraph-properties fo:margin-top="36pt" fo:margin-bottom="10.01pt" loext:contextual-spacing="false" fo:break-before="page" style:writing-mode="page"/>
</style:style>
<style:style style:name="P68" style:family="paragraph" style:parent-style-name="Chapter_20_Title">
<style:paragraph-properties fo:margin-top="100.01pt" fo:margin-bottom="10.01pt" loext:contextual-spacing="false" style ...
(more)
edit retag flag offensive close merge delete

4 Answers

Sort by » oldest newest most voted
1

answered 2020-05-18 19:23:13 +0200

jaragon gravatar image

I'd like to produce variant versions of my book from a single master version, where the main differences are just in the formatting. The simplest example is creating a 4"x7" edition by just unzipping a 4x7 template .odt and the book's master 5"x8" edition, copying the 5"x8" edition's content.xml file into the unzipped template, editing the ISBN and then re-zipping. I use the same name paragraph styles in all variants.

If the differences are simply different formatting, and if you do all your formatting with styles, and if you do in fact use the same named styles, then there is a much easier way to change from your master document to a differently formatted version without having to hack the content.xml file.

  1. Download and install the Template Changer extension.

  2. Create a template for each of the different formats you want, using exactly the same style names in each template.

Now you can use the Template Changer extension to change the template assigned to the document, if you want to change the formatting of the document; or you can make a copy of the document and use Template Changer on the copy if you want a separate document in a different format.

edit flag offensive delete link more

Comments

Thanks, I'll investigate that, it sounds very helpful.

Luke Kendall gravatar imageLuke Kendall ( 2020-05-19 17:04:51 +0200 )edit
1

answered 2020-05-18 16:11:46 +0200

ajlittoz gravatar image

I strongly advocate against trying to fool Writer with such a trick. You'll end up with a damaged document.

As you have seen, Writer assigns "indirect" style names when it uses your defined styles ( like P99 for a paragraph style). This allows to handle direct formatting in a unified manner. But the indirection depends on the document history. The same named style may receive a different indirect name in two documents with the same set of styles if the styles were referenced in a different order (due to insertion, deletions and editing). Consequently, don't try to paste a content.xml from one file to the other.

What is direct formatting?

In simple words, this formatting made outside styles. Whenever you push a tool bar button, use a Format menu command or a keyboard shortcut, you are direct-formatting. In XML, this translates to another of these intermediate strange styles but without reference to one of the named styles (anonymous style). Taking the example of Ctrl+B for bolding characters, a new character style is created for every occurrence. You easily understand you can't rely on the numeric id of the style to characterise "bold" in a document and another one.

How to avoid direct formatting?

Use exclusively styles. Contrary to M$ Word, LO Writer provides many more style categories to cover nearly all aspects of document authoring:

  • paragraph styles (the most known category because it is shared by all document suites),
  • character style (to apply a formatting variant inside a paragraph),
  • frame styles (for positioning properties of inserts),
  • page styles (for general aspect of pages),
  • list styles (in fact the properties of a numbering sequence and how it affects a paragraph style).

I don't mention here the recent table "styles" because they are not styles like the others but templates operated by macros.

Your job, as an author and document maintainer, is to ensure that absolutely everything in your book is controlled by styles. The only exceptions are some manual page breaks to force a new page style after the break when the break can't be included in a paragraph style and the restart of numbering for lists. I repeat, everything else must be style-formatted.

Also, wherever possible, if some information depends on a paragraph content, use fields to duplicate the information. For example, to copy the chapter title into the header; thus, the same page style may be used for several chapters.

Also, never position your text with empty paragraphs or non-semantic page break (in my wording, a "semantic" page break is associated with a break in the discourse, not a clumsy attempt to avoid orphan/widow lines). Tuning your styles should fix all formatting issues, eventually accepting a very few numbers of "approximations".

The set of styles so defined and tuned should go into a template file (.ott extension).

If you have not yet based your book on a template, don't worry. Install the DocumentTemplateChanger extension. You'll then be able to assign ... (more)

edit flag offensive delete link more

Comments

Thanks for explaining all that, especially the clear explanation of direct formatting.

Is there an extension which will distil a document down to a set of minimum styles? E.g. define a single style for all text of a specific set of properties (Font, size, style), e.g. define an EmphasisTR12 style for all TimesRoman italic 12pt?

Or I suppose I could analyse the XML to work this out myself.

I imagine the reason Writer creates a new style each time you make some text italic (which is great ergonomics, a single key stroke or button press), is just in case the user wants each piece of text they directly set to italic a possibly different italic? (I'm trying to guess why Writer would create a new style rather than reuse an existing one.)

In any case, I'll definitely try out what you've described here.

I confess ...(more)

Luke Kendall gravatar imageLuke Kendall ( 2020-05-19 17:13:37 +0200 )edit

The reason why Writer assigns a different style every time you press Ctrl+B is it can't guess whether you consider that bold to have the same meaning as the previous one. This is where styles come in. Consider styles to be author annotations for the text: this sequence is "important", that sequence is "sarcastic". You chose that both "important" and "sarcastic" are displayed as bold. They look the same, but the meaning, the semantics is different. Afterwards, you can turn "important" red so that there is no longer ambiguity for the reader. You only change the style and all occurrences are simultaneously updated. This is great but you must be disciplined from the start.

Minimal set of styles: only you can tell. Think of styles as semantic markup, not as typographical effect. Your example of TimesRoman Italic 12 pt is wrong. Why did you choose so? An emphasis ...(more)

ajlittoz gravatar imageajlittoz ( 2020-05-19 17:25:58 +0200 )edit

And since visual effects are rather limited in number, you inevitably end up with several styles looking the same. This is not important if context allows to make the difference. But, you, as an author, can change the look of a specific sequence of TR ital 12 to something else because you carefully markep up the document.

The big task is to correctly mark up the text. Don't rely on any automatic tool. Don't even try to scan the XML. It may exhibit non significant style changes because of technical limitations. This is a manual task because you bring added value to it.

You'll know you have eliminated all direct formatting when Ctrl+M on any selection do not show any formatting change.

Styles may be as handy as the shortcuts you already know because you can assign shortcuts to your favourite styles with Tools>Customize.

ajlittoz gravatar imageajlittoz ( 2020-05-19 17:32:13 +0200 )edit

That's exactly what I thought, thanks. It's a pity there's not a mode you can choose for Writer to select between in 'every markup may be unique' mode or 'every markup is the same'. I strongly believe the latter is far more common than the former. Because I know I only use a specific style change for a single purpose, I know that distilling my document to a minimal set of styles is what I want. There is no other semantic markup: the visible changes represent exactly the semantic differences, and if they look the same they are the same. Even if only a majority of my uses of italics were for a single semantic purpose, it would be far easier to manually find and split those apart into separate styles than the current implementation, where searching for italic text isn't reliable If you're saying ...(more)

Luke Kendall gravatar imageLuke Kendall ( 2020-05-20 06:58:27 +0200 )edit

I've also installed DocumentTemplateChanger and have restarted Writer, but can't find any documentation on how to use it, nor any change to the Writer UI that would let me experiment with it. No, wait: by looking at the TemplateChanger extension I found the functionality is accessed via a File>Templates item.

Luke Kendall gravatar imageLuke Kendall ( 2020-05-20 07:00:44 +0200 )edit

I'm currently drawing a blank in trying to find what you mean by "make sure that the styles in the templates take precedence on those in the document."

I take your point about the danger of setting myself for a trap in saving a .odt version. I'll think about that. The opposing risk is forgetting to change a (probably small set) of text items that must be changed, such as the ISBN. Maybe I could manage that risk by marking such items with a comment with some marker text. OTOH, I'd need to save a .docx for the epub and the mobi versions too, and they have more substantive changes that would need to be reapplied each time, so I think on balance a .odt version for each version would be less likely to lead to errors. Ideally I could have a front and back section for ...(more)

Luke Kendall gravatar imageLuke Kendall ( 2020-05-20 07:33:14 +0200 )edit

Style conflicts: without entering into details, make any style adjustment only in the template. The inconvenient is that you must close the document and reopen it for the changes in the template to take effect. When you edit your template, use the Open Template command, not the other usual ones otherwise you get a new document based on the template and you must go through the full procedure to reinstall it.

Multiple versions: I forgot the ISBN issue which means your books show differences, then you need different documents. You end up with 4 files (at least): template .ott, common content .odt and 2 master documents .odm. The master documents will contain the front and back covers plus the "directive(s)" to import the common content. In this case, the page styles will be in the master document and you need no longer change the template.

Caution! If you're ...(more)

ajlittoz gravatar imageajlittoz ( 2020-05-20 07:59:39 +0200 )edit

ODF/docx: there are differences between the formats. If your keep a simple structure for your content (template/master/sub-document is not important when exporting), there should not be problems. By simple structure, I mean a linear sequence of paragraphs, few inserted objects (tables are the most problematic), few or no nested frames/objects (avoid tables within tables), simple numbering scheme.

Be aware that exporting to PDF and saving as .docx are not the same and may show differences. I have no experience with ePub.

ajlittoz gravatar imageajlittoz ( 2020-05-20 08:08:18 +0200 )edit

Many thanks regarding the master documents: that sounds like the solution I need long term, in combination with rigorous use of styles.

Regarding ODF/docx: yes, they're just fiction books, so they're a simple structure. The most complex thing is a small table right at the end, in the ebook editions.

Thanks again, you've been a great help. I'll be doing all this over the next few days.

Luke Kendall gravatar imageLuke Kendall ( 2020-05-20 15:26:12 +0200 )edit

Thanks also for the note about File>Templates>Open Template. I was doing the wrong thing until you pointed that out, too. Ah, wait, maybe it's too late: now, when I open the template file it warns me

"The template 'Book-5x8' on which this document is based, has been modified. Do you want to update style based formatting according to the modified template?"

But even if I choose Update styles and then Save, next time I Open Template I still get the same warning.

Ah, maybe fixed it myself: I see there's also a File>Templates>Save as Template. By doing that, and then choosing to ignore the warning when I close the file that I need to save the document or my changes will be lost, it seems to be okay.

Luke Kendall gravatar imageLuke Kendall ( 2020-05-20 15:35:45 +0200 )edit
0

answered 2020-08-27 16:49:14 +0200

Luke Kendall gravatar image

tl;dr - I don't think Writer can do what I need. I'm afraid I've come to the decision it's not really suitable for writing books, at least not if you want to produce multiple formats or layouts.

I've just spent a week learning about Master Documents.

Unfortunately, because the natural (most efficient) way to use Writer is to use direct formatting, I found from my experiment that the body of my document is rife with apparently random stretches of text that's been directly formatted:

image description

I wrote all this, yet I can't imagine why any of the text that appears as 10.5pt inside the 9pt text was directly formatted.

Sadly, the amount of work needed to correct that would be absurd. Far higher than I'm ever likely to save from having a single file in which to make corrections, rather than making the same corrections to four separate copies. I can't even see a semi-automatic way to remove the direct formatting. Just finding it is difficult - though I could now use a master document to see the DF text by manual inspection.

Now, given that I use only three character styles, Writer's design decision to create a new unnamed character style whenever DF is used, is exactly the reverse of what I want.

I can't see a way to remove all direct formatting, since I need a function which would remove it all without changing the visual appearance: in other words, a way to map the DF text to an existing 'base style' (regular or emphasis). Without that, I'd be looking at I think hundreds of hours of mind numbing, error prone manual work.

I think the design of the direct formatting feature of Writer ("assume the most complex interpretation is what the user desired, rather than the simplest"), is going to be the issue which finally pushes me to find some other word processor, after using Writer for approximately 15,000 hours. I find that sad.

edit flag offensive delete link more

Comments

because the natural (most efficient) way to use Writer is to use direct formatting

This is the initial flaw in your use of Writer. It is neither "natural" (only inspired by ubiquitous M$ Word workflow) nor "efficient" (and this is an understatement: from the app point of view, zillions of DF formatted sequences is a waste of resources; form the author point of view, you end up with of loss of information).

You can have the same "typing economy" with styles once they are defined (e.g. Ctrl+B can be reassigned to Strong Emphasis character style, etc.). Don't confuse UI (the gestures to activate features) with semantics (the significance of sequences). They live in different levels of the workflow.

Of course, once a large document is fully DF, it is next to impossible to restyle it in a reasonable amount of time. And as such, you are deprived ...(more)

ajlittoz gravatar imageajlittoz ( 2020-08-27 17:15:58 +0200 )edit

Link to question 262553

Contrary to your statement, Writer is perfectly fit to write books even in multiple formats. For that, you can't just "TLDR" (too long, don't read) and start to type blindly. Every tool is specific and you must learn how to use it. Your goal (single source, multiple formats with different constraints) is a complex one and must be addressed from the start. Otherwise, you end up with what you experience: without provision for the key points, you are compelled to re-layout manually every version.

You said you've used Writer for ~15k hours. You should then be an expert provided you explored the features. Have you spent some time, let's say a comfortable 50 hours, to read the manual before starting? What are 50 hours compared to 15k? Have also thought about the semantic organisation of your book? How much of your author ...(more)

ajlittoz gravatar imageajlittoz ( 2020-08-27 17:26:05 +0200 )edit

As you say, if I had known just how big the problem of DF was, and had assigned Ctrl-I to set Emphasis character style, I might be in good shape.

I say "might", not "would", because I can't believe that all the DF text is due to changes to or from italics or bold. I suspect there's some other common operation which makes text DF.

Yes, I've thought about the semantic organisation of my books. They're pretty good - each time I find an issue with paragraph or page styles I try to fix it through styles. What I didn't expect was the DF character style mess caused by using default editing features of Writer.

I'll do a few experiments to see if I can work out what causes DF text, since it's hard for me to believe it's all stemming from toggling ...(more)

Luke Kendall gravatar imageLuke Kendall ( 2020-08-27 17:56:38 +0200 )edit

DF is not documented anywhere

I agree that there's not much mentioning of it in the help - but still there are; like here.

because I can't believe that all the DF text is due to changes to or from italics or bold

I strongly suspect that coming from e.g. copying text with formats from other sources, like other documents of web pages. Such text should better be inserted without formatting, to not bring its problems along with its text. When you pasted a text with size 15, color Black, background pale-yellow, ... into your document, then copied something from this place to another, you started to proliferate the places that enjoy that set of unknown formatting nightmare that had landed when you pasted it from Web for the first time...

Mike Kaganski gravatar imageMike Kaganski ( 2020-08-27 18:21:19 +0200 )edit
0

answered 2020-08-28 16:26:23 +0200

JohnHa gravatar image

updated 2020-08-28 17:28:18 +0200

  • I don't think Writer can do what I need. I'm afraid I've come to the decision it's not really suitable for writing books, at least not if you want to produce multiple formats or layouts. *

That is unbelievably easy to do and is precisely what LO is designed to do.

It is unfortunately your "incorrect use of LO for writing a book when you want multiple versions" which is causing the problem. It is not a problem with LO and all other word processors will do the same. Your method of Direct Formatting is excellent for "single-use, short documents" but terrible for "a book when you want multiple versions".

It is done using styles where you designate things to be paragraphs, chapter headings, quotes etc; and where you choose Page Styles for the various pages (Title Page, First Page of chapter, ToC, Front matter etc). Read Writer for Students which is very good and explains all you need to know. See also the Writer Guide chapters on using Styles.

All you now have to do is to change the Page Styles and change the other Styles and Bingo! You have your new book to a completely new format.

Unfortunately you have used Direct Formatting which overrides the underlying Style and defeats the whole object of using Styles.

In your situation I would first (working on a copy) remove all the Direct Formatting by Edit > Select All > Ctrl+M. You now have a file of Default plain text with no formatting - it is just like a Notepad file. You now apply the Styles to this vanilla text.

You now define the Page Styles, and Paragraph Style, Chapter Heading Style and Quotes Style, etc, you want for your first book. You now apply those Styles to the plain text. You now have your first book. It would take me about an hour for a 300 page book.

Starting with your plain text file, you now define the second book Page Styles, and Paragraph Style, Chapter Heading Style and Quotes Style, etc, and apply them. You now have your second book. Again, it should take about an hour for a 300 page book.

Trust me. It will be far quicker and far easier to remove all the Direct Formatting you have used and apply the Styles to the plain text than to try to apply Styles to formatted text.

If you have completed your document as a single file there seems little point in converting it to a Master Document with sub-documents.

NB Don't even think of messing about with content.xml as therein lies a recipe for a complete disaster. Do it properly using Styles. A "<text:soft-page-break/>" is inserted into content.xml by LO to force a Page Break on your screen according to the Page Style currently in use. If you change the Page size, LO will move these tags to different places to give the new page breaks on screen.

Also ... (more)

edit flag offensive delete link more

Comments

See my (long) answer to question 258664/in-book-publishing-how-best-to-change-the-page-size-for-a-new-edition/#258683 where I explain how to avoid the one-hour style change when you want a new edition (assuming the new edition is not a replacement but coexists with the first).

You need one master document per edition (optionally based on the same template to inherit common shared styles) which include the novel text as a sub-document. Covers may be different and they are formatted directly in the masters. The masters will also show a unique ISBN.

ajlittoz gravatar imageajlittoz ( 2020-08-28 17:07:52 +0200 )edit
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2020-05-18 15:12:18 +0200

Seen: 92 times

Last updated: Aug 28