How to avoid including sections from other documents when saving?

How can I save a document but not include the included sections? We have a large writer document, main.fodt, that includes sections from over 1000 different sub documents. We would like to be able to save the main document without including all the sections from the sub documents into the saved file. We have observed that if the sections are included the main document becomes too large and libreoffice has trouble loading it. Also we are hosting the documents on GitHub so it is necessary to keep the size of the documents as small as possible.

To reproduce: To observe what happens when a document that includes a section from another document is saved, follow this steps:

  • Create two empty files: main.fodt, child.fodt
  • In child.fodt: Insert a header line: “The start of the child doc”, then insert a section “Section1” below that: Position the cursor where you want the section to appear, then from the menu: Insert->Section and Type in “Section1” under the “New Section” input field. Then press “Insert” button. Then move the cursor inside the inserted section, and type “This is Section1 from the child doc”. Then save child.fodt.
  • In main.fodt: Insert a header line: “The start of main”, then insert a section “Section1” below that: Position the cursor where you want the section to appear, then from the menu: Insert->Section and Type in “Section1” under the “New Section” input field. Then click the “Link” radio button in the right pane. Then in the “File name” text field, type in the absolute location of the child.fodt document, for example: “file:///home/hakon/child.fodt”. In the “Section” text field type in “Section1”. Then press the “Insert” button.
  • Save main.fodt
  • Check that main.fodt contains the section from child.fodt, e.g. run grep "This is Section1 from the child" main.fodt

Try FileSendCreate Master Document.

Your question is not clear at all.

You seem to be concerned by a storage issue which can be solved by adopting a master (.odm file extension) + ordinary sub-documents. Then only a link to the sub-document is retained plus some auxiliary data to cope with potentially conflicting configuration between the master and sub.

However, your example procedure ("to reproduce: …) shows a more subtle situation where you reference only a nested section within the “sub-document” instead of including the whole of it. This partial inclusion can’t be handled with master + subs.

If the purpose of your sections in the sub-document is only there to offer selective access to them, you are using the section feature the wrong way. Sections are provided to temporarily change the number of columns in a page or create a dedicated subdivision of the document to partition note collection. Break your present sub-documents into “atomic” bits (“atomic” means etymologically “can’t be broken”) you will reference/link/include as a whole. Then the master feature will do what you’re looking for: save only a link to sub-document without its text.

We need to keep the sub documents under version control on GitHub. This way we can observe diffs for pull requests. As I understand the odm format is a binary format? That’s why we are using flat format (.fodt). How can we solve this issue?

You file a feature request. There is no option to not write cached content. Opening a document with external links shows a request to update the links; but in case the request is rejected, users usually expect to see the previous data.

So unless you ask for it, and - best - even implement it (we welcome any contribution!), the only thing you can do yourself is to use some post-processing of the documents prior to updating it in your version control…

To clarify why we ended up using sub sections: We had originally only a large .fodt document, eventually it became so large that libreoffice slowed down or even crashed when we tried edit it. We then tried the FileSendCreate Master Document , but that also crashed. Then we used a Python script to split it up into sections. This is working fine for now, but we cannot save the main document since that would reinclude all the sub sections into the main document again, as I mentioned above.

ODF (the underlying XML document encoding) is not really “friendly” with version control like git. LO uses ODF features to identify changes segments for better relevant indication in Track Changes features. This is done by interspersing text with special XML elements which could give false (or rather non meaningful) change events in git. To improve git accuracy, you should disable Tools>Options, LibreOffice Writer>Comparison Store it [=random number] when changing the document.

Anyway ODF structure will not cooperate well to provide a clear user-significant log of changes. And if your users practice extensively direst formatting instead of exclusive style formatting, the situation will become even worse because direct formatting ids are susceptible to change in area not concerned by current user modification.

Give more details about your document evolution. If change history is rather linear, i.e. you have no fork from main line to “explore” editorial tracks, then you could have a try with File>Versions and keep the document in a shared directory outside git. But this is restricted to a linear history.

And, yes, .odm is a zipped format. However, git can be extended with add-ons. Perhaps you can find one which allows git to “see” inside zip archives. However, I don’t know if you can customise Github the same.

Can you give an order of magnitude? How many pages? How many pages on average in your present sub-docs?
What kind of formatting? Adopting a strict methodical styling discipline can drastically decrease the resulting file size.

Currently there is approximately 2900 pages in the document. The main.fodt document has been reduced in size to 94 Mb after splitting it into 1400+ sub documents (representing chapters and sub sections). Each sub document (also .fodt format) is around 3Mb-500Kb. Formatting includes tables, figures, screenshots.

Mmh … 2900 pages split into 1400 documents gives ~2 pages per document. I’d say this is too detailed and transfers the size burden unto the complexity of handling so many subs. And if you also structured your subs with sections inside, you are really putting Writer under stress. 2-page chapters look strange to me; they are usually larger.

Have you specific “original” text in you main.fodt in addition to the “links” to the subs. I find 94 MB quite large if there is no original text (this may be an effect of the “local copy” of text bits).

I reiterate my question about the necessity (or usage) of sections inside the “sub-documents”. Avoiding unnecessary complex document architecture contributes to size reduction and improves Writer performance.

Yes, right. I forgot to mention that we included the fonts into main.fodt such that it would look the same on Linux, Windows, and macOS. We are currently working with a solution to remove the fonts from the main.fodt. I guess removing the fonts should reduce the size to below 50 Mb.