I’d like to convert LibreOffice files (Writer, Impress) and convert them to readable text so that I can meaningfully diff separate versions in Git. I am using OpenOffice 4.0.1 on Mac OS X 10.8.2.
I stumbled upon Is there a command line tool to convert documents to plain text files? and its recommendations to use
soffice --headless --convert-to <TargetFileExtension>:<NameOfFilter> file_to_convert.xxx
didn’t quite work, because every time I invoke the command with the “Text” filter, the following error message is returned:
/Applications/LibreOffice.app/Contents/MacOS/soffice --headless --convert-to txt:"Text" example.odp convert example.odp -> example.txt using Text Overwriting: example.txt Error: Please reverify input parameters...
I poked around to see if there are any of the filters mentioned in http://cgit.freedesktop.org/libreoffice/core/tree/filter/source/config/fragments/filters were in the LibreOffice.app folder, but I couldn’t find any of them. I suspect their absence might be the cause of the error message. Is there a way to install these filters?
Alternately, is there an alternate way to convert these files that doesn’t require me to compile anything? I’ll be sharing these files with other people who also want to diff the files, and distributing a Python script would be much easier than having to walk them through a build process. I uncovered the odt2txt.py script, which is a step in the right direction, but strips out all images in presentations. It’s better than nothing, but something that preserves some sort of useful information related to images would be a great improvement.