Ask Your Question
1

How to convert ODT file to TXT keeping the text of the footnotes?

asked 2020-05-13 13:20:40 +0200

vstepaniuk gravatar image

updated 2020-05-13 13:21:44 +0200

How to convert ODT file to TXT keeping the text of the footnotes?

edit retag flag offensive close merge delete

Comments

Edit your question to tell us what you want to do with the resulting file. .txt might not be the best format.

ajlittoz gravatar imageajlittoz ( 2020-05-13 15:21:06 +0200 )edit

@ajlittoz, I need to convert specifically to TXT

vstepaniuk gravatar imagevstepaniuk ( 2020-05-13 15:31:50 +0200 )edit

On Linux: libreoffice --convert-to pdf <name>.odt && pdftotext <name>.pdf (on my openSUSE 15.1 system pdftotext is part of package poppler-tools)

Opaque gravatar imageOpaque ( 2020-05-13 15:57:17 +0200 )edit

@Opaque, ok, thanks, though, why not an "answer"?

vstepaniuk gravatar imagevstepaniuk ( 2020-05-13 16:00:23 +0200 )edit

@vstepaniuk: just out of curiosity, once it is converted to txt, how do you use it (in which context)? Do you want a txt equivalent of Writer formatting? Like old-time README files or IANA RFC specification?

ajlittoz gravatar imageajlittoz ( 2020-05-13 16:05:01 +0200 )edit

@vstepaniuk - your wish is my command.

Opaque gravatar imageOpaque ( 2020-05-13 16:11:28 +0200 )edit

@ajlittoz, no need for any formatting, I will use it for text processing later. But if you have a good solution how to retain the text formatting, feel free to add an answer!

vstepaniuk gravatar imagevstepaniuk ( 2020-05-13 17:53:14 +0200 )edit

@vstepaniuk: the best way to keep all information of your document is to save it .odt ;-) This is a zipped file, so it uses minimal disk space (in formatting preserving capability).

If you want to process document content in another application (awk, macro processor, …) as text not binary, you can save it .fodt. You get an exact XML representation of the document. But to process it efficiently, you need to know the details of ODF specification. This is not zipped, so a bit fatter.

You can recover the content without formatting by stripping blindly all XML markup, leaving only textual content. Notes are preserved in the process, but not the position in the page.

I also explored the idea of using a good old ASCII impact printer but CUPS wouldn't let me create such an antique device. It insists on PostScript or PDF printers. Dead end. My goal was ...(more)

ajlittoz gravatar imageajlittoz ( 2020-05-13 18:08:54 +0200 )edit

4 Answers

Sort by » oldest newest most voted
1

answered 2020-05-13 15:06:41 +0200

LeroyG gravatar image

vstepaniuk,

It is not the most elegant solution, but… you can Export as PDF... (FileExport as...), open the PDF file in your PDF viewer, Select all, Copy and Paste in a new document.

edit flag offensive delete link more

Comments

Thanks! Very nice GUI solution!

vstepaniuk gravatar imagevstepaniuk ( 2020-05-13 17:30:09 +0200 )edit
1

answered 2020-05-13 16:10:25 +0200

Opaque gravatar image

updated 2020-05-13 16:16:39 +0200

Hello,

On Linux you may use: libreoffice --convert-to pdf <name>.odt && pdftotext <name>.pdf (on my openSUSE 15.1 system pdftotext is part of package poppler-tools)

Note(s):

  • Short search on the net yields that the tool pdtotext is also availabe for Windows (but I got absolutely no experience with the tool on Windows)
  • Using BASIC function Shell() is should be an easy hack to create a macro for that
  • Drawback: Number(s) of footnotes are shown first and then the text(s) of the footnotes on separate lines

Hope that helps.

If the answer helped to solve your problem, please click the check mark (✔) next to the answer.

edit flag offensive delete link more

Comments

Thanks, Nice command line solution! pdftotext is available either from poppler or from xpdfhttps://en.wikipedia.org/wiki/Pdftotext

vstepaniuk gravatar imagevstepaniuk ( 2020-05-13 17:39:54 +0200 )edit
0

answered 2020-05-13 19:58:36 +0200

vstepaniuk gravatar image

updated 2020-05-13 20:46:14 +0200

One option would be to save the file in FODT format (Flat XML ODF Text Document) and use the following perl command:

perl -wn0le 'print $1 if /<office:body>([\s\S]*?)<\/office:body>/' file.fodt | perl -p0e 's/<.*?>//g' | perl -p0e 's/^\s+//gm'

It extracts everything from the FODT document between the <office:body> and </office:body> tags, removes all tags from the result, and also removes all consecutive whitespace starting from the start of line, including newlines.

The footnotes will be IN PLACE of footnote anchors!

A full command-line solution (including the conversion to FODT) for file.odt:

in=file.odt; libreoffice --convert-to fodt:"OpenDocument Text Flat XML" "$in" && perl -wn0le 'print $1 if /<office:body>([\s\S]*?)<\/office:body>/' "${in/.odt/.fodt}" | perl -p0e 's/<.*?>//g' | perl -p0e 's/^\s+//gm'

Thanks @ajlittoz for the comments!

edit flag offensive delete link more
0

answered 2020-05-13 20:33:42 +0200

ajlittoz gravatar image

updated 2020-05-13 20:34:12 +0200

At @vstepaniuk's request, I make my comment an answer.

The best way to keep all information of your document is to save it .odt ;-) This is a zipped file, so it uses minimal disk space (in formatting preserving capability).

If you want to process document content in another application (awk, macro processor, …) as text not binary, you can save it .fodt. You get an exact XML representation of the document. But to process it efficiently, you need to know the details of ODF specification. This is not zipped, so a bit fatter.

You can recover the content without formatting by stripping blindly all XML markup, leaving only textual content. Notes are preserved in the process, but not the position in the page.

I also explored the idea of using a good old ASCII impact printer but CUPS wouldn't let me create such an antique device. It insists on PostScript or PDF printers. Dead end. My goal was to "print" the document and retrieve the spool file.

To show the community your question has been answered, click the ✓ next to the correct answer, and "upvote" by clicking on the ^ arrow of any helpful answers. These are the mechanisms for communicating the quality of the Q&A on this site. Thanks!

In case you need clarification, edit your question (not an answer) or comment the relevant answer.

edit flag offensive delete link more
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2020-05-13 13:20:40 +0200

Seen: 140 times

Last updated: May 13