The conversion of docx files to pdf gets stuck after reaching a certain amount

Description

BaseImage:		linuxserver/libreoffice:7.6.3
Version:		LibreOffice 7.6.3.1 60(Build:1)
CPU:			3000Mi @ 2.50GHz
Memory:			4G 

​ I have a java service that manages libreoffice processes. After the java program is run, LibreOffice will be started through the unoserver service, and then docx will be converted to pdf through script call.

​ The java program fragment is as follows:

    private static void startUnoServer() {
        try {
            ProcessBuilder processBuilder = new ProcessBuilder("unoserver");
            processBuilder.start();
        } catch (IOException | InterruptedException e) {
            Thread.currentThread().interrupt();
        }
    }

​ The process within the server is as follows:

At this point you can start file conversion.

question

​ After our testing. Every time about 220 conversions, the conversion process will be stuck. After killing the thread, the next conversion will still be stuck. Only by killing soffice.bin and restarting the libreoffice service can the file continue to be converted. But it will get stuck again around the 220th time.

​ We have considered the issue of jvm memory, but unfortunately it has nothing to do with it. The jvm memory is set to 1g or 16g, and it still gets stuck after conversion about 220 times.

​ We have also tried docx files with different contents or different page numbers. No matter the file size is 200kb or 5mb, it will gets stuck about 220 times.

But we discovered something strange:

case Ⅰ: Until it gets stuck 220 times

  • The docx file path is /temp/<32-bit UUID>/<32-bit UUID file name>

case Ⅱ: Until it gets stuck 260 times

  • The docx file path is /temp/a.docx

​ After changing the file path to one character, the number of conversions increased dozens of times, but it still got stuck.

What can I do to convert docx files stably?

​ I probably found the solution.
​ After many tests, I estimate that the cause of the freeze is that the system output buffer is filled with unoserver logs. This also confirms that shortening the file path can increase the number of conversions (220 times increased to 260 times) The conversion log of unoserver is as follows:

127.0.0.1 - - [27/Feb/2024 06:45:12] "POST /RPC2 HTTP/1.1" 200 -
INFO:unoserver:Starting unoconverter.
INFO:unoserver:Opening <file-path-input> for input
INFO:unoserver:Exporting to <file-path-output>
INFO:unoserver:Using writer_pdf_Export export filter

I am using the process library that comes with Java. The previous startup method was as follows, and the log InputStream returned by it was not processed. Its log output may fill a buffer in the system until it is blocked.

ProcessBuilder processBuilder = new ProcessBuilder("unoserver");        
processBuilder.start();

Now I changed the java code to the following:

ProcessBuilder processBuilder = new ProcessBuilder("bash", "-c", "nohup unoserver > /dev/null 2>&1 &");
processBuilder.start();

​ Other than that, other conditions remain unchanged. After testing, the number of conversions has exceeded 50,000, and there is still no stuck.

​ But there is another question. Why does my attach pod start by user inputting a command (the log is in the nohub.out file) and it gets stuck when the number of conversions reaches about 22,000? Are there some system-level restrictions?

1 Like

In the end, there was no stuck phenomenon after 100,000 conversions. This problem may have been solved. :face_with_monocle: :face_with_monocle: :face_with_monocle:

1 Like

have you considered just by command line ?

1 Like

​ Thank you for your suggestion, but it is a pity that it has not been solved. Through your ideas, we discovered another interesting situation:

Libreoffice managed using a java program (root authority) can only withstand 220 conversions. When we attach the container and kill libreoffice started by java, and started it by the user (I execute the command myself), it can withstand about 19,000 conversions, then it gets stuck.
We compared the libreoffice process limits of different startup methods, and they were completely consistent.
This is outrageous! :dizzy_face: :dizzy_face: :dizzy_face:

I decided to make a test on Windows, using LibreOffice 24.2.1.1: started an instance of the program (which will serve all the requests), and in a shell, started a loop that outputs a counter then does a conversion of one single file.

Now it is at ~850; I expect it to reach ~19 000 in about two hours. Let’s see. Of course, it might be system-specific or version-specific or configuration-specific … who knows.

3 Likes

A data point: the Windows conversion successfully passes 20 000 iterations, after which I terminated it:

FTR, the exact command line was

for ((i = 0; i < 1000000; i++)); do echo $i; "C:/Program Files/LibreOffice/program/soffice" --convert-to odt --outdir C:/Users/mikek/AppData/Local/Temp D:/Downloads/word.docx ; done

Now I try the same process on an Ubuntu instance, this time using debug build of master. It is much slower, I don’t think it will end today.

1 Like

Thank you very much for your help. Now I think of another situation, which may be related to Java or system limitations!
I added relevant content to the question description :melting_face: