I have tried the command line libreoffice --headless --convert-to txt:Text file.pptx but this seems to do nothing at all. From within libreoffice I can save as html but largely makes images of the slides where I want the text.
I’m not sure if it is exactly what you want, but there is a technique for getting text from Impress, although it is rightly described as “low-tech”.
Btw, I assume by “pptx” you mean a “presentation”, i.e., Impress document in LibO. If you really do mean MS’s Powerpoint .pptx … you’re at the wrong site. You can also have a look at this StackOverflow Q&A on ppt/pptx text extraction to see if there’s any help there.
Thank you. I did mean powerpoint .pptx which as I am in linux I import into libreoffice to view. I don’t have microsoft office at all.
Under Linux you could script a solution using unzip
and grep
etc. For example, this will pipe the contents of slide 1 to the screen:
$ unzip -p /path/to/my_pres.pptx ppt/slides/slide1.xml
You should be able to loop through slide numbers and grep paragraph text / list items from that.