Converting only 2 pages is taking more time using headless mode

satya_kompella · May 15, 2025, 8:02am

I have a docx file which contains approximately 7500 pages, converting this document to pdf is taking around 26 mins in headless command line mode.
I wanted to do it in chunks to get the better performance for the user i.e., atleast convert the few pages first and then remaining pages later.
So, I tried to convert the first 2 pages using --convert-to pdf:writer_pdf_Export:{“PageRange”:{“type”:“string”,“value”:“1-2”}}
but the problem is converting the first 2 pages also taking around 15 mins. so it looks like it need to process/load the entire document for converting the few pages .

Is there a way to make this faster atleast for few pages ?

Regards,
Satya

Wanderer · May 15, 2025, 8:41pm

My usual answer for software: Yes and No.
.
Yes, a lot of things are possible, especially for developers, who can add their own code…
.
No, because LibreOffice was not intended as a file-converter. So what it does is importing your .docx to its internal format the same way as for interactive work. Then it handles your requested export.
.
Also consider: Writer is not working page oriented, so don’t assume it will store files as separate pages. You have a stream of text, wich will be fitted on the given size for pages. So Writer can not jump to page 5 in a .docx and decode only the next 3 pages.
.
Last point: There will always be some overhead, so converting 5 times 20 pages will be slower than converting 100 pages in a row, even if you have software wich could handle each segment quickly.

mikekaganski · May 23, 2025, 9:59am

Note also, that the reason that we load it completely, before doing the rest of work, is not because “it is not intended as file converter” (it is), but because office file formats are really complex; and you can’t layout their content without enough information. E.g., you can’t put a correct “page X of Y” in a header of page 1, before you know the correct Y - which means, you have to layout to the last page 7500, to get that info. This is just the first thing coming to mind; and no, there is no feasible way to create a list of “if there is no A, no B, and no C, then we can safely stop here after reading data for page 1” - this list will become a nightmare to maintain the same day we introduce it: every new feature would require to check, how it fits that list, combined with every other feature in the software (we have thousands of them, and their square is simply impossible to check “what if the new feature K is used together with setting L and in section with property M …”).