# How to convert ODT file to TXT keeping the text of the footnotes?

How to convert ODT file to TXT keeping the text of the footnotes?

edit retag close merge delete

Edit your question to tell us what you want to do with the resulting file. .txt might not be the best format.

( 2020-05-13 15:21:06 +0200 )edit

@ajlittoz, I need to convert specifically to TXT

( 2020-05-13 15:31:50 +0200 )edit

On Linux: libreoffice --convert-to pdf <name>.odt && pdftotext <name>.pdf (on my openSUSE 15.1 system pdftotext is part of package poppler-tools)

( 2020-05-13 15:57:17 +0200 )edit

@Opaque, ok, thanks, though, why not an "answer"?

( 2020-05-13 16:00:23 +0200 )edit

@vstepaniuk: just out of curiosity, once it is converted to txt, how do you use it (in which context)? Do you want a txt equivalent of Writer formatting? Like old-time README files or IANA RFC specification?

( 2020-05-13 16:05:01 +0200 )edit

@vstepaniuk - your wish is my command.

( 2020-05-13 16:11:28 +0200 )edit

@ajlittoz, no need for any formatting, I will use it for text processing later. But if you have a good solution how to retain the text formatting, feel free to add an answer!

( 2020-05-13 17:53:14 +0200 )edit

@vstepaniuk: the best way to keep all information of your document is to save it .odt ;-) This is a zipped file, so it uses minimal disk space (in formatting preserving capability).

If you want to process document content in another application (awk, macro processor, …) as text not binary, you can save it .fodt. You get an exact XML representation of the document. But to process it efficiently, you need to know the details of ODF specification. This is not zipped, so a bit fatter.

You can recover the content without formatting by stripping blindly all XML markup, leaving only textual content. Notes are preserved in the process, but not the position in the page.

I also explored the idea of using a good old ASCII impact printer but CUPS wouldn't let me create such an antique device. It insists on PostScript or PDF printers. Dead end. My goal was ...(more)

( 2020-05-13 18:08:54 +0200 )edit

Sort by » oldest newest most voted

vstepaniuk,

It is not the most elegant solution, but… you can Export as PDF... (FileExport as...), open the PDF file in your PDF viewer, Select all, Copy and Paste in a new document.

more

Thanks! Very nice GUI solution!

( 2020-05-13 17:30:09 +0200 )edit

Hello,

On Linux you may use: libreoffice --convert-to pdf <name>.odt && pdftotext <name>.pdf (on my openSUSE 15.1 system pdftotext is part of package poppler-tools)

Note(s):

• Short search on the net yields that the tool pdtotext is also availabe for Windows (but I got absolutely no experience with the tool on Windows)
• Using BASIC function Shell() is should be an easy hack to create a macro for that
• Drawback: Number(s) of footnotes are shown first and then the text(s) of the footnotes on separate lines

Hope that helps.

more

Thanks, Nice command line solution! pdftotext is available either from poppler or from xpdfhttps://en.wikipedia.org/wiki/Pdftotext

( 2020-05-13 17:39:54 +0200 )edit

One option would be to save the file in FODT format (Flat XML ODF Text Document) and use the following perl command:

perl -wn0le 'print $1 if /<office:body>([\s\S]*?)<\/office:body>/' file.fodt | perl -p0e 's/<.*?>//g' | perl -p0e 's/^\s+//gm'  It extracts everything from the FODT document between the <office:body> and </office:body> tags, removes all tags from the result, and also removes all consecutive whitespace starting from the start of line, including newlines. The footnotes will be IN PLACE of footnote anchors! A full command-line solution (including the conversion to FODT) for file.odt: in=file.odt; libreoffice --convert-to fodt:"OpenDocument Text Flat XML" "$in" && perl -wn0le 'print $1 if /<office:body>([\s\S]*?)<\/office:body>/' "${in/.odt/.fodt}" | perl -p0e 's/<.*?>//g' | perl -p0e 's/^\s+//gm'


more

At @vstepaniuk's request, I make my comment an answer.

The best way to keep all information of your document is to save it .odt ;-) This is a zipped file, so it uses minimal disk space (in formatting preserving capability).

If you want to process document content in another application (awk, macro processor, …) as text not binary, you can save it .fodt. You get an exact XML representation of the document. But to process it efficiently, you need to know the details of ODF specification. This is not zipped, so a bit fatter.

You can recover the content without formatting by stripping blindly all XML markup, leaving only textual content. Notes are preserved in the process, but not the position in the page.

I also explored the idea of using a good old ASCII impact printer but CUPS wouldn't let me create such an antique device. It insists on PostScript or PDF printers. Dead end. My goal was to "print" the document and retrieve the spool file.

To show the community your question has been answered, click the ✓ next to the correct answer, and "upvote" by clicking on the ^ arrow of any helpful answers. These are the mechanisms for communicating the quality of the Q&A on this site. Thanks!