Speed up load time (large document, 2,000 pages)

I have as customer that has an desktop pc w/an i5 processor and 16gb of ram (albeit ddr3), Win10 64 bit. Literally the ONLY thing he uses this computer for is word processing - and he doesn’t even have Internet.

He has a book split into (2) separate 2,000 page Microsoft Word documents. I am trying to find a way to speed up load time (at present it take over 5-6 minuters just to load the documetn into memory). I am look for ways to increase dedicated application memory. IN later 7.x versions of Libre Office the memory settings were moved somewhere (not sure where). Anyway, I am open to dedicating 12gb of RAM if necessary just to Libre Office if that will help - I just don’t know where to go to do this. Earlier versions had a memory settings for total memory and memory (per object?), though I am not sure what object referenced or where those settings have moved.

Any help would be greatly appreciated - I couldn’t find anyone offering support in the irc channel.

FYI - the document is almost all text (very little pictures)

If your customer wants to work exclusively with LibreOffice in the future, you should save the documents in ODT file format.
The previous work under Word, will not meet the standard of Writer in LibreOffice, or possibly even lead to larger files.
It would make sense to insert the entire text of each file into a new empty ODT file as unformatted text.
Since Writer has an excellent styles management (1), it is important to train yourself on this topic and to perform the formatting in the files exclusively with styles.
If there are oversized images in the documents, it is recommended to reduce them before reinserting them.
This way you should get documents that are much faster to load.

(1) Professional text composition with Writer

English documentation

Memory settings are no longer required in newer LibreOffice versions.

Don’t add one-line comments just after closing your question. Edit it instead. This is more friendly for potential contributors.

Memory setting is no longer necessary. It is dynamically allocated and the OS virtual memory management does the job. Thus your problem is not memory but document contents.

  • Document is saved .doc(x) which necessitates conversion at load and save times.
    This conversion is approximate because the format is alien, has no exact correspondence with internal format and many “constructs” must be translated into groups of primitives because they are not native in Writer (and vice versa at save time).
  • The way the document is formatted has huge importance on load time
    The less interpolation the input filter has to do, the better.

    If your customer has no external constraint, i.e. doesn’t work in a collaborative environment where other workers are locked in M$ Word, convert the document to native *.odt. format.

    But this is not sufficient. Document structure is already damaged by the cumulative effect of previous conversions. In addition Word has very few “structuring” primitives and condemns authors to manual formatting (called direct formatting in Writer parlance). Word offers only paragraph styles and relatively few authors use them.

    Writer has additionally character styles which replace the need for Ctrl-B, Ctrl-I and others by assigning a style to these emphasis. The net effect on the file is that there is a reference to a single style instead of creation of multiple occurrences of single-use anonymous styles (this contributes to decreasing file size).

    There are also page styles which allow to define the common geometry of a sequence of pages. This effects is very loosely achieved in Word with “sections” (beware, in Writer, a section is totally different: it is a part of page with a different number of columns). Unfortunately, due to the huge difference in concept, conversion creates one page style per physical page. This is one point to eliminate to improve load time, but it is next to impossible to get rid of it due to the mess caused by previous conversions.

    The real cure is to review the full book under the styles philosophy. This starts by determining the set of styles needed for the work. Built-in ones are a good start. Heading n family is for chapter and sub-chapter headings. Text Body is for main discourse. Emphasis is for usually italicised words and Strong Emphasis for bolded words.

    As @Hrbrgr suggests, the easiest way to get rid of all M$ Word induced problems is to paste the book as unformatted text in a fresh document. You then restyle paragraphs and words.

    With a 2000-page book, this will take time but it is worth it.

    My personal experience goes up to ~500-page books (with very complex formatting) and I have no performance problem. Even if load time is quadratic, a 2000-page work should take no more than 16 times my load time, i.e. a matter of tens of seconds only.
1 Like

So overall, do you think if we’re able to convert it to an ODT format, it will dramatically increase the load time, especially if right now it’s taking 6 to 7 minutes (w/16gb or ram)?

You will have to issue the warranty card yourself after you have tested it. :wink::sunglasses:

I hope on the contrary it will decrease load time!

But there is a long road to the goal. Restyling such a book is not a pleasure party. IMHO, only the original author can mark up the text with styles such that all semantic nuances are preserved. And this is the high-level purpose of semantic styling. Only semantic styling can achieve minimal structure overhead with full respect of author’s intent. Above all, prohibit direct formatting.

Note however that restyling the book is worth the trouble only if the book is still maintained, edited and reviewed. If it is in a frozen state and only opened for display, consider saving it once for all in PDF format (but keeping the original book as .doc(x) for future possible revision).

This shows the reason why the settings were removed from the GUI (tdf#110448). There is a misconception that there is a limit of memory that LibreOffice can use, and that limit is configurable. In fact, there are several caches in LibreOffice graphic managers, and they control OLE objects and images, allowing to swap those that are unused; in some corner cases, tweaking those expert settings could make a difference - but in general, the tweaks only make things worse. And these settings never ever affect textual content, be it frames, footnotes, headers, etc. LibreOffice can easily use all available memory if it needs, there’s no limit for that.

The performance problems that you see need investigation (thus, bug reports with samples); we have people who like to crack bugs marked perf. Even cases where tweaking caches helps are, in fact, bugs - which, if filed properly (with samples), could likely enable us to find ways to improve cache management.

1 Like

This will very likely be unimportant, but just in case: do you use tables or frames for layout or formatting? Because that’s a perfect recepy to slow down LibreOffice so just to make sure that this isn’t causing it, I wanted to point that out. What I mean by that is: if you cultivated a habit of putting all your text into a table (can flow over pages just like normal text) this will slow down the document, in my experience.