My application runs on a Ubuntu 16.04 web server where uploaded files automatically get’s converted to PDF with doc2pdf
which is part of unoconv
which again uses Libreoffice in headless mode. When trying to convert the corrupted DOCX document hangs with 100% of the CPU utilized and eventually I have to reboot to recover.
I want to one of these things to happen:
- Command exits with an error
- Set a timeout and if reached, the command
- exits with an error
- Be able to detect if DOCX document is broken
Setting the --timeout
on doc2pdf
doesn’t help.
When I try:
doc2pdf works.docx
I get this and then a prompt:
$ doc2pdf works.docx
W: Unknown node under /registry/extlang: deprecated
W: Unknown node under /registry/grandfathered: comments
W: Unknown node under /registry/grandfathered: comments
Fontconfig warning: ignoring UTF-8: not a valid region tag
$
When I try (-vvv
means verbose debugging mode):
doc2pdf -vvv broken.docx
I get this and then hangs forever:
$ doc2pdf -vvv broken.docx
Verbosity set to level 3
Using office base path: /usr/lib/libreoffice
Using office binary path: /usr/lib/libreoffice/program
DEBUG: Connection type: socket,host=127.0.0.1,port=2002,tcpNoDelay=1;urp;StarOffice.ComponentContext
DEBUG: Existing listener not found.
DEBUG: Launching our own listener using /usr/lib/libreoffice/program/soffice.bin.
LibreOffice listener successfully started. (pid=2940)
W: Unknown node under /registry/extlang: deprecated
W: Unknown node under /registry/grandfathered: comments
W: Unknown node under /registry/grandfathered: comments
Fontconfig warning: ignoring UTF-8: not a valid region tag
Input file: broken.docx
Pressing CTRL + C will exit with this error:
^Cunoconv: SystemError during import phase:
Couldn't instantiate python representation of structured UNO type com.sun.star.lang.DisposedException
Traceback (most recent call last):
File "/usr/bin/doc2pdf", line 1278, in <module>
die(exitcode)
File "/usr/bin/doc2pdf", line 1131, in die
if convertor.desktop.getCurrentFrame():
uno.DisposedException: Binary URP bridge already disposed
Trying to convert a normal DOCX document with Libreoffice headless directly works fine:
libreoffice --headless --convert-to pdf works.docx
Trying to convert a corrupted DOCX document with Libreoffice headless directly doesn’t work:
libreoffice --headless --convert-to pdf broken.docx
The output for both looks like this. Only difference is when trying to convert the corrupted DOCX document it hangs:
javaldx: Could not find a Java Runtime Environment!
Warning: failed to read path from javaldx
W: Unknown node under /registry/extlang: deprecated
W: Unknown node under /registry/grandfathered: comments
W: Unknown node under /registry/grandfathered: comments
Fontconfig warning: ignoring UTF-8: not a valid region tag
convert /home/forge/broken.docx -> /home/forge/broken.pdf using filter : writer_pdf_Export