How to convert ODT file to TXT keeping the text of the footnotes?
How to convert ODT file to TXT keeping the text of the footnotes?
First time here? Check out the FAQ!
How to convert ODT file to TXT keeping the text of the footnotes?
vstepaniuk,
It is not the most elegant solution, but… you can Export as PDF...
(File
‣ Export as...
), open the PDF file in your PDF viewer, Select all
, Copy
and Paste
in a new document.
Hello,
On Linux you may use: libreoffice --convert-to pdf <name>.odt && pdftotext <name>.pdf
(on my openSUSE 15.1 system pdftotext
is part of package poppler-tools
)
Note(s):
pdtotext
is also availabe for Windows (but I got absolutely no experience with the tool on Windows)Shell()
is should be an easy hack to create a macro for thatHope that helps.
If the answer helped to solve your problem, please click the check mark (✔) next to the answer.
Thanks, Nice command line solution! pdftotext
is available either from poppler
or from xpdf
https://en.wikipedia.org/wiki/Pdftotext
One option would be to save the file in FODT format (Flat XML ODF Text Document) and use the following perl
command:
perl -wn0le 'print $1 if /<office:body>([\s\S]*?)<\/office:body>/' file.fodt | perl -p0e 's/<.*?>//g' | perl -p0e 's/^\s+//gm'
It extracts everything from the FODT document between the <office:body>
and </office:body>
tags, removes all tags from the result, and also removes all consecutive whitespace starting from the start of line, including newlines.
The footnotes will be IN PLACE of footnote anchors!
A full command-line solution (including the conversion to FODT) for file.odt
:
in=file.odt; libreoffice --convert-to fodt:"OpenDocument Text Flat XML" "$in" && perl -wn0le 'print $1 if /<office:body>([\s\S]*?)<\/office:body>/' "${in/.odt/.fodt}" | perl -p0e 's/<.*?>//g' | perl -p0e 's/^\s+//gm'
Thanks @ajlittoz for the comments!
At @vstepaniuk's request, I make my comment an answer.
The best way to keep all information of your document is to save it .odt ;-) This is a zipped file, so it uses minimal disk space (in formatting preserving capability).
If you want to process document content in another application (awk, macro processor, …) as text not binary, you can save it .fodt. You get an exact XML representation of the document. But to process it efficiently, you need to know the details of ODF specification. This is not zipped, so a bit fatter.
You can recover the content without formatting by stripping blindly all XML markup, leaving only textual content. Notes are preserved in the process, but not the position in the page.
I also explored the idea of using a good old ASCII impact printer but CUPS wouldn't let me create such an antique device. It insists on PostScript or PDF printers. Dead end. My goal was to "print" the document and retrieve the spool file.
To show the community your question has been answered, click the ✓ next to the correct answer, and "upvote" by clicking on the ^ arrow of any helpful answers. These are the mechanisms for communicating the quality of the Q&A on this site. Thanks!
In case you need clarification, edit your question (not an answer) or comment the relevant answer.
Asked: 2020-05-13 13:20:40 +0200
Seen: 320 times
Last updated: May 13 '20
Automatic footnote numbering [closed]
How do you add bullets to an already number-bulleted group of statements? [closed]
Can I use Microsoft Publisher files with LibreOffice? [closed]
Alternating table column numbering, Col1 Numbers, Col2 Letters? [closed]
Are there any plans to improve image handling? [closed]
LibreOffice 3.5 Writer crashes when making PDF [closed]
How can I center the document in a Writer window? [closed]
How to change datasource in form letter / mail merge imported from MS-Word [closed]
Edit your question to tell us what you want to do with the resulting file. .txt might not be the best format.
@ajlittoz, I need to convert specifically to TXT
On Linux:
libreoffice --convert-to pdf <name>.odt && pdftotext <name>.pdf
(on my openSUSE 15.1 systempdftotext
is part of packagepoppler-tools
)@Opaque, ok, thanks, though, why not an "answer"?
@vstepaniuk: just out of curiosity, once it is converted to txt, how do you use it (in which context)? Do you want a txt equivalent of Writer formatting? Like old-time README files or IANA RFC specification?
@vstepaniuk - your wish is my command.
@ajlittoz, no need for any formatting, I will use it for text processing later. But if you have a good solution how to retain the text formatting, feel free to add an answer!
@vstepaniuk: the best way to keep all information of your document is to save it .odt ;-) This is a zipped file, so it uses minimal disk space (in formatting preserving capability).
If you want to process document content in another application (awk, macro processor, …) as text not binary, you can save it .fodt. You get an exact XML representation of the document. But to process it efficiently, you need to know the details of ODF specification. This is not zipped, so a bit fatter.
You can recover the content without formatting by stripping blindly all XML markup, leaving only textual content. Notes are preserved in the process, but not the position in the page.
I also explored the idea of using a good old ASCII impact printer but CUPS wouldn't let me create such an antique device. It insists on PostScript or PDF printers. Dead end. My goal was ...(more)