Advice on replacing one document's content.xml with another?

Many thanks regarding the master documents: that sounds like the solution I need long term, in combination with rigorous use of styles.

Regarding ODF/docx: yes, they’re just fiction books, so they’re a simple structure.
The most complex thing is a small table right at the end, in the ebook editions.

Thanks again, you’ve been a great help. I’ll be doing all this over the next few days.

Thanks also for the note about File>Templates>Open Template. I was doing the wrong thing until you pointed that out, too.
Ah, wait, maybe it’s too late: now, when I open the template file it warns me

“The template ‘Book-5x8’ on which this document is based, has been modified. Do you want to update style based formatting according to the modified template?”

But even if I choose Update styles and then Save, next time I Open Template I still get the same warning.

Ah, maybe fixed it myself: I see there’s also a File>Templates>Save as Template. By doing that, and then choosing to ignore the warning when I close the file that I need to save the document or my changes will be lost, it seems to be okay.

Being rigorous about styles though will probably take a day or two per book, because Writer can’t reliably find italic text. So finding all cases of direct format italics will have to be manual, and error prone.

Then be kind enough to check my answer as satisfactory (click only once on the gray check mark. It may take a while before it turns green).

I’m getting close now to correcting all the problems for all four editions of my first book. (That is, 5"x8" and 4.24"x7" paperback editions, epub edition, and kindle edition).
The problem I’m facing now is that when I tidied up all the page styles in the .docx file (which was all I had for the epub edition), I now have random page breaks in the HTML produced by Calibre after importing the .docx file.
I’m still trying to work out why.
I notice in the XML (I got desperate and saved as .odt and then looked inside that), that there are lots of places where a text:soft-page-break/ appears in the middle of a paragraph. I certainly would not have inserted them, and I can’t find any information about them. What are they for? Are they relevant to my random page-breaks problem? They don’t seem to match up with the page breaks Calibre generates in the HTML for the epub.
The old version of the .docx used to convert correctly using Calibre; the new version has these random page breaks inserted.

Tomorrow I plan to convert the older version of the .docx file to .odt and look inside that, to see if I can see what’s different to the new (supposedly clean and greatly fixed) .docx version.

But this bug sounds highly relevant:
https://bugs.documentfoundation.org/show_bug.cgi?id=43692

If so, maybe the solution will be for me to edit the contents.xml and remove all the soft-pagebreaks.
I can try that tomorrow, too.
(1am here now.)

I try to avoid tweaking the XML: my strategy is to use the styles to express my formatting goal, extracting maximum juice out of them. Then I don’t try to figure out how the style applications will be translated into ODF and its incarnation as XML. I prefer to stay with a single paradigm and not be bothered with the interactions between two.

One point you should consider: apparently your original book is .docx. This is very important and might explain some anomalies. Writer is much richer style-wise than Word and exact conversion cannot be achieved. Notably, Word has no notion of page style. I suspect that part of your problems are probably a consequence of the conversion.

I had a look at one of my complex documents. <text:soft-page-break/> seem to be reminders set by Writer where page breaks occur. When met, the current page style footer and header are laid out on the page. They are “soft” in the sense that Writer …

… has full freedom to move them as opposed to your manual page break which must inflexibly remain where you put it to respect your formatting. My advice: don’t remove them or Writer will need to recompute the page breaks. More generally, don’t play with the XML; this is Writer playground. Yours is the document in plain English. Concentrate on the styles (all of them, paragraph, character, page). They are made to represent your expected formatting. Avoid conversions; always save .odt. Convert only if an addressee requires it And double-check formatting to make sure.

I’m using the XML for diagnosis and understanding.
The original format was .odt; I write my books in LibreOffice. I can’t remember exactly the problem that made me lose the version source .odt file for the epub edition of my book. But because Calibre works much better with .docx files than .odt (the developer, Kovid Goyal advised me back in 2019), I convert the .odt to .docx to make a Word file for Calibre to import.
I insert a manual page break before each new chapter.
If Writer only uses soft-pagebreak in docs converted from Word, why leave them in? Doesn’t Writer normally have to work out where the page breaks go anyway? The soft-pagebreaks are all in stupid places where the page shouldn’t break, so I can’t see how they help Writer.
I’ve now looked inside the .docx too: each spurious page break has moderate-sized slab of xml including an <w:sectPr> section; these also occur just before genuine chapter starts.
The text:soft-page-breaks are more frequent in the .odt and more random.

I’ve also just now looked inside the .docx for the previous version, that doesn’t get the spurious page breaks added by Calibre during the conversion process.

In that .docx file, the <w:sectPr>…</w:sectPr> always encloses the paragraph before a section break (i.e. before each chapter). There are none sprinkled elsewhere. NB: there was always exactly one extra added, on the Ch 1st page

So it seems during my edits or round trips between .docx and .odt, something introduced a whole bunch of these.

I’m going to try deleting them from the problematic .docx file’s XML and see how I go.

Fascinating:

When I simply re-zipped the .docx from the unzipped elements and opened that new file in Writer, it showed me each of the spurious page breaks: they’re visible as page breaks in Writer!

When I re-zip to create a .docx from my edited word/document.xml and reconvert in Calibre, it’s perfect.

In writer the conversion messed up many headers and footers, but they’re ignored in the epub anyway.

Removing the soft-page-breaks from the .odt file made no difference to the .docx file produced, as far as the spurious page breaks.
When I look at the source .odt’s XML, I can’t see any markup that would induce a page break at the spurious points. But if I Save As .docx from the .odt file and look at that .docx in Writer itself, it too shows the spurious page breaks.
And I see it’s Writer that breaks the page styling and messes up the headers and footers.
So I’m thinking I need to file a bug report for Writer’s .docx generation.

I’d like to produce variant versions of my book from a single master version, where the main differences are just in the formatting. The simplest example is creating a 4"x7" edition by just unzipping a 4x7 template .odt and the book’s master 5"x8" edition, copying the 5"x8" edition’s content.xml file into the unzipped template, editing the ISBN and then re-zipping. I use the same name paragraph styles in all variants.

If the differences are simply different formatting, and if you do all your formatting with styles, and if you do in fact use the same named styles, then there is a much easier way to change from your master document to a differently formatted version without having to hack the content.xml file.

  1. Download and install the Template Changer extension.

  2. Create a template for each of the different formats you want, using exactly the same style names in each template.

Now you can use the Template Changer extension to change the template assigned to the document, if you want to change the formatting of the document; or you can make a copy of the document and use Template Changer on the copy if you want a separate document in a different format.

Thanks, I’ll investigate that, it sounds very helpful.

tl;dr - I don't think Writer can do what I need. I'm afraid I've come to the decision it's not really suitable for writing books, at least not if you want to produce multiple formats or layouts.

I've just spent a week learning about Master Documents.

Unfortunately, because the natural (most efficient) way to use Writer is to use direct formatting, I found from my experiment that the body of my document is rife with apparently random stretches of text that's been directly formatted:

![image description](upload://aMdTRL3wj6CKwTQXF5yLX98n75Z.png)

I wrote all this, yet I can't imagine why *any* of the text that appears as 10.5pt inside the 9pt text was directly formatted.

Sadly, the amount of work needed to correct that would be absurd. Far higher than I'm ever likely to save from having a single file in which to make corrections, rather than making the same corrections to four separate copies. I can't even see a semi-automatic way to remove the direct formatting. Just finding it is difficult - though I could now use a master document to see the DF text by manual inspection.

Now, given that I use only three character styles, Writer's design decision to create a new unnamed character style whenever DF is used, is exactly the reverse of what I want.

I can't see a way to remove all direct formatting, since I need a function which would remove it all without changing the visual appearance: in other words, a way to map the DF text to an existing 'base style' (regular or emphasis). Without that, I'd be looking at I think hundreds of hours of mind numbing, error prone manual work.

I think the design of the direct formatting feature of Writer ("assume the most complex interpretation is what the user desired, rather than the simplest"), is going to be the issue which finally pushes me to find some other word processor, after using Writer for approximately 15,000 hours. I find that sad.

because the natural (most efficient) way to use Writer is to use direct formatting

This is the initial flaw in your use of Writer. It is neither “natural” (only inspired by ubiquitous M$ Word workflow) nor “efficient” (and this is an understatement: from the app point of view, zillions of DF formatted sequences is a waste of resources; form the author point of view, you end up with of loss of information).

You can have the same “typing economy” with styles once they are defined (e.g. Ctrl+B can be reassigned to Strong Emphasis character style, etc.). Don’t confuse UI (the gestures to activate features) with semantics (the significance of sequences). They live in different levels of the workflow.

Of course, once a large document is fully DF, it is next to impossible to restyle it in a reasonable amount of time. And as such, you are deprived of the automatic reformatting outlined in question 262553.

Link to question 262553

Contrary to your statement, Writer is perfectly fit to write books even in multiple formats. For that, you can’t just “TLDR” (too long, don’t read) and start to type blindly. Every tool is specific and you must learn how to use it. Your goal (single source, multiple formats with different constraints) is a complex one and must be addressed from the start. Otherwise, you end up with what you experience: without provision for the key points, you are compelled to re-layout manually every version.

You said you’ve used Writer for ~15k hours. You should then be an expert provided you explored the features. Have you spent some time, let’s say a comfortable 50 hours, to read the manual before starting? What are 50 hours compared to 15k? Have also thought about the semantic organisation of your book? How much of your author creative job is seeded in the doc struct?

As you say, if I had known just how big the problem of DF was, and had assigned Ctrl-I to set Emphasis character style, I might be in good shape.

I say "might", not "would", because I can't believe that all the DF text is due to changes to or from italics or bold. I suspect there's some other common operation which makes text DF.

Yes, I've thought about the semantic organisation of my books. They're pretty good - each time I find an issue with paragraph or page styles I try to fix it through styles. What I didn't expect was the DF character style mess caused by using default editing features of Writer.

I'll do a few experiments to see if I can work out what causes DF text, since it's hard for me to believe it's all stemming from toggling between italics and regular.

Of the 15k hrs, most of it was *using* it, not explicitly learning it. I've probably spent 100-200 hrs studying Writer itself. DF is not documented anywhere: your explanation above is the best I've seen.

DF is not documented anywhere

I agree that there’s not much mentioning of it in the help - but still there are; like here.

because I can’t believe that all the DF text is due to changes to or from italics or bold

I strongly suspect that coming from e.g. copying text with formats from other sources, like other documents of web pages. Such text should better be inserted without formatting, to not bring its problems along with its text. When you pasted a text with size 15, color Black, background pale-yellow, … into your document, then copied something from this place to another, you started to proliferate the places that enjoy that set of unknown formatting nightmare that had landed when you pasted it from Web for the first time…

  • I don’t think Writer can do what I need. I’m afraid I’ve come to the decision it’s not really suitable for writing books, at least not if you want to produce multiple formats or layouts. *

That is unbelievably easy to do and is precisely what LO is designed to do.

It is unfortunately your “incorrect use of LO for writing a book when you want multiple versions” which is causing the problem. It is not a problem with LO and all other word processors will do the same. Your method of Direct Formatting is excellent for “single-use, short documents” but terrible for “a book when you want multiple versions”.

It is done using styles where you designate things to be paragraphs, chapter headings, quotes etc; and where you choose Page Styles for the various pages (Title Page, First Page of chapter, ToC, Front matter etc). Read Writer for Students which is very good and explains all you need to know. See also the Writer Guide chapters on using Styles.

All you now have to do is to change the Page Styles and change the other Styles and Bingo! You have your new book to a completely new format.

Unfortunately you have used Direct Formatting which overrides the underlying Style and defeats the whole object of using Styles.

In your situation I would first (working on a copy) remove all the Direct Formatting by Edit > Select All > Ctrl+M. You now have a file of Default plain text with no formatting - it is just like a Notepad file. You now apply the Styles to this vanilla text.

You now define the Page Styles, and Paragraph Style, Chapter Heading Style and Quotes Style, etc, you want for your first book. You now apply those Styles to the plain text. You now have your first book. It would take me about an hour for a 300 page book.

Starting with your plain text file, you now define the second book Page Styles, and Paragraph Style, Chapter Heading Style and Quotes Style, etc, and apply them. You now have your second book. Again, it should take about an hour for a 300 page book.

Trust me. It will be far quicker and far easier to remove all the Direct Formatting you have used and apply the Styles to the plain text than to try to apply Styles to formatted text.

If you have completed your document as a single file there seems little point in converting it to a Master Document with sub-documents.

NB Don’t even think of messing about with content.xml as therein lies a recipe for a complete disaster. Do it properly using Styles. A “text:soft-page-break/” is inserted into content.xml by LO to force a Page Break on your screen according to the Page Style currently in use. If you change the Page size, LO will move these tags to different places to give the new page breaks on screen.

Also remember how LO works - LO always uses Styles but users often override them.

When you create a new document and start typing what you type is in the Style of Default text. If you then want a Chapter Heading you have two choices:

Choice 1 - Direct Formatting. Highlight the text and apply Bold, font size, font etc until it looks how you want

Choice 2 - Use Styles. Highlight the text and select the Chapter Heading Style for it.

Note how in Choice 2 every Chapter Heading will be identical because you did not have to remember what font, font size etc you want to use. Choice 1 is very error prone.

If you use Choice 2 and you want to change the Chapter Heading format all you need to do is edit the Chapter Heading Style. Every Chapter Heading now takes the new Style. With Direct Formatting you have to edit each one separately which is time consuming and error prone.

See my (long) answer to question 258664/in-book-publishing-how-best-to-change-the-page-size-for-a-new-edition/#258683 where I explain how to avoid the one-hour style change when you want a new edition (assuming the new edition is not a replacement but coexists with the first).

You need one master document per edition (optionally based on the same template to inherit common shared styles) which include the novel text as a sub-document. Covers may be different and they are formatted directly in the masters. The masters will also show a unique ISBN.