How do I install filters for the `soffice` command?

I’d like to convert LibreOffice files (Writer, Impress) and convert them to readable text so that I can meaningfully diff separate versions in Git. I am using OpenOffice 4.0.1 on Mac OS X 10.8.2.

I stumbled upon Is there a command line tool to convert documents to plain text files? and its recommendations to use

soffice --headless --convert-to <TargetFileExtension>:<NameOfFilter> file_to_convert.xxx

didn’t quite work, because every time I invoke the command with the “Text” filter, the following error message is returned:

/Applications/LibreOffice.app/Contents/MacOS/soffice --headless --convert-to txt:"Text" example.odp 
convert example.odp -> example.txt using Text
Overwriting: example.txt
Error: Please reverify input parameters...

I poked around to see if there are any of the filters mentioned in http://cgit.freedesktop.org/libreoffice/core/tree/filter/source/config/fragments/filters were in the LibreOffice.app folder, but I couldn’t find any of them. I suspect their absence might be the cause of the error message. Is there a way to install these filters?

Alternately, is there an alternate way to convert these files that doesn’t require me to compile anything? I’ll be sharing these files with other people who also want to diff the files, and distributing a Python script would be much easier than having to walk them through a build process. I uncovered the odt2txt.py script, which is a step in the right direction, but strips out all images in presentations. It’s better than nothing, but something that preserves some sort of useful information related to images would be a great improvement.

@qubit1: Thanks for the help! The flat XML format is exactly what I was looking for. I’d upvote you if I had enough karma. To follow up, is there any way to convert .od files to .fod files on the command line, in case I screw up and accidentally commit changes to .od files multiple times?

@GeoffOxberryqubit throws some karma at you

Hmm… I think you can convert them on the command-line like this:

$ ./soffice --headless --convert-to fodt:"OpenDocument Text Flat XML" embed-image-test.odt

I don’t know if either of these two ideas will help, but one of them might help while waiting for the command-line option. (Note: I am on Windows 7 running LibreOffice 4.0.0.3, so there might be some differences.)

In LibreOffice Writer, while one has a document open, one can do a “File” | “Save as”, and the “Save as type” label is next to a pull-down menu, one of the choices presented is “Text (.txt) (*.txt)”. This will save the file with each “paragraph” being a line of plain text, no formatting.

There are a couple of other text options so some testing might be called or to see if one of them meets your needs.

Another thing LibreOffice Writer has is a “Compare Documents…” (“Edit” | “Compare documents…”), but I don’t think that will meet your needs since it doesn’t write any diff files.

Hi @GeoffOxberry,

I’m able to convert an ODT (Writer) file to text…

qubit@loopbackoffice$ ./soffice --headless --convert-to txt:"Text" embed-image-test.odt 
convert /home/qubit/embed-image-test.odt -> /home/qubit/embed-image-test.txt using Text

I get the same error as you when I try to convert an ODP (Impress) file to text…

qubit@loopbackoffice$ ./soffice --headless --convert-to txt:"Text" img-test.odp
convert /home/qubit/img-test.odp -> /home/qubit/img-test.txt using Text
Error: Please reverify input parameters...

It’s possible that there’s just no text export filter for Impress. I’ll ping someone and ask…

…okay, I pinged people and it looks like Impress doesn’t have a text output filter (there’s a lot of non-text stuff in an Impress file, for one).

Have you considered using the flat XML file formats for storing content in version control?

The .fod* file formats give you the ability to diff your file changes, plus you can work directly with these files in LO and don’t have to do any extra conversion steps before checking them into your VCS!

Hallelujah!

This worked for me.