Performance issue for Large File Conversion to PDF with LibreOffice and JodConverter

I am using LibreOffice to convert documents to PDF with the help of JodConverter. I am facing performance issues when converting large files, such as those between 30 and 40 MB, which take a long time to convert and sometimes fail. I also tried using the command line, but I encountered the same issue. How can I improve the performance of the conversion?

Excuse me Sir, but I would like to know if you are the actual author of the files and what kind of word processor you usually practice with.

Awaiting your kind reply

Sincerely yours

nicholas

I have a requirement to convert all user attachments to PDF during export. The file types I need to convert include “.xls”, “.xlsx”, “.doc”, “.docx”, “.ppt”, “.pptx”, and “.rtf”.

Currently, I’m using LibreOffice with the JODConverter Java library, running as a microservice in AWS ECS. When a user initiates an export, the files are uploaded to S3, and I invoke a convert API that retrieves them from S3, converts them to PDF, and uploads the results back to S3.

However, I’ve noticed that LibreOffice is quite slow in converting these documents. Is there any way to optimize the performance? Alternatively, are there other open-source tools available that can convert these file types to PDF more efficiently?

Use a Windows PC with MS software and a virtual PDF printer. Write a script feeding the right MS application with the right document, the print switch and the name of your PDF printer.

JodConverter is a separate program and not part of LibreOffice; I suggest you will get more helpful and knowledgeable replies from a forum that specialises in JodConverter (if there is one?).

Have you tried the LibreOffice built in export to PDF? You can run that from the command line also Starting LibreOffice Software With Parameters

Additionally, it seems your file format amongst other things will have an affect on performance according to the JodConverter Wiki Performance · jodconverter/jodconverter Wiki · GitHub

I’d say it is more a shell to call Open- or LibreOffice, wich will does the actual conversion. So I guess better performance for a single job can only be reached with better hardware.
.
There are hints to use more than one profile/connection in the docs, but this would only speed up conversion of large numbers of documents. There is no possibility of parallel rendering of individual pages etc…

1 Like

A 2MB RTF file is taking a long time to convert and eventually fails when running the following command through the command line. How can I speed up this conversion? Below is the script I’m using:

soffice --headless --convert-to pdf --outdir D:/convert_to_pdf/output D:/convert_to_pdf/input/2024-09-19-07-34-02-2mb.rtf

Any suggestions for improving performance would be appreciated.

I have a requirement to convert all user attachments to PDF during export. The file types I need to convert include “.xls”, “.xlsx”, “.doc”, “.docx”, “.ppt”, “.pptx”, and “.rtf”.

Currently, I’m using LibreOffice with the JODConverter Java library, running as a microservice in AWS ECS. When a user initiates an export, the files are uploaded to S3, and I invoke a convert API that retrieves them from S3, converts them to PDF, and uploads the results back to S3.

However, I’ve noticed that LibreOffice is quite slow in converting these documents. Is there any way to optimize the performance? Alternatively, are there other open-source tools available that can convert these file types to PDF more efficiently? Kindly help me with this.

What kind of hardware are you running this process on? Can you share the file that is taking too long?

Also, the --backtrace argument may give you more detail (I’ve never tried it myself)