I have 100 + html docs downloaded from Firefox and saved as complete web pages.
I want to save them as pdf(Using Firefox addon - WE edit - the page can be saved as pdf - but it is blank)
Saving as just html file - saves without the pictures…
Saving as complete web page, opening in Writer and exporting to PDF works well.
*The HTML has three parts
Beginning - at the firs line of the doc and ends with a specific text line(this repeats on all the 100+ web pages)
End - denoted by a specific Word and ends at the last line of the doc*
Here’s the macro I’d like to have:
The HTML opens in Writer
The Content is extracted
The doc is exported as pdf with the name of the original html
The new html is saved as docx
I am not a programmer, so I do not know how complicated this is - but any advice even to just automate part of the procees will be greatly appreciated.
-1- Use a SearchDescriptor for the loaded document to find the “specific text” ending the intro.
-2- Select upwards to the very beginning.
-3- Delete selection.
-4- Use the SerarchDescriptor to find the keyword for the lead-out.
-5- Select to end of text.
-6- Delete selection.
-7- Export to .pdf by .storeToURL(). (FilterName = “writer_pdf_Export”)
-8- Export to …docx by .storeToURL(). (FilterName = “writer_OOXML” I suppose. I never use it.)
-9- Cloe the docukment.
-10- Delete it (the old file) if wanted - if not the old filae has the same URL as the new one…
(Generally I dissuade from using hostile file-formats.)
You may try to record a macro for the sequence of actions. You will need to rwework it manually, I’m afraid.
It depends. Some may try and never succeed. Somebody may study (say) the texts by Andrew Pitonyak, and solve the thing the next day. (Kidding a bit.)
If there isn’t a certain amount of experience in programming and in using any API not to speak of the very special one available for LibreOffice, I would dissuade from trying. Might end up with a lot of wasted time.
You wouldn’t only need to do the specific kind of internal “document automation”, but also to organize the selection of the files to work on, and this should be the part where the macro-recorder won’t help you much.
Everything isn’t exactly difficult, but …
And in specific the “FilterData” for pdf export may be problematic. I also lack experience insofar.
If you can hire a schoolboy feeling somehow bored … but being steady enough …
If you think to actually need a “number of days” it might pay to find somebody who completes and “certifies” the code contained in the attached file for you if you don’t feel capable of doing it yourself.
Parts of hat code are tested, parts are slightly reworked versions of recorded macros.
A relevant fiunction is missing because a solution workable in an unknown environment seemed too complicated to do it “just so”. clipAndExportTwice.odt
Basically your problem is highly specialized. Wjhat I could provide as an “answer” you find in my first comment. Beyond that it’s “development”.
Check the directory where it will replace correctly!!! Regex Batch Replacer is very quick program and it doesn’t forgive the bad choice of the directory - it can rewrites all files in the directory in few seconds without a pardon.
Then run Command line - you can press Win+R and then write cmd. It runs Command line.
Switch to your directory with the files you want to convert. For example you have the directory d:\myfiles, so then write (only bold text, no write Enter :-))
d: press Enter
cd myfiles Enter
And then write next lines to the Command line:
for %f in (*.odt) do ( Enter
start /wait “” “C:\Program Files\LibreOffice\program\soffice.exe” --headless --convert-to pdf %f Enter
It will convert your ODT files to the PDF. It is slow, but it seems it is functional.
I believe you will find alone the converting from HTML (or to DOCX) by this way. I hope it is possible.