Can I convert HTML to a document with a table of contents?

asked 2018-07-03 16:05:33 +0100

GuavaMacarena gravatar image

I have some HTML like this:

<h1>Chapter One</h1>

<p>Our story begins...</p>
<p>And then he said...</p>
<p>More stuff happened...</p>

<h1>Chapter Two</h1>

<p>You get the idea...</p>

Dead simple. (This is produced by my JavaScript code where the book is chapters = [{ title: "Foo", paragraphs: ["Our story...", "And then..."}, { title: "Bar", paragraphs: ["Baz", "and so on"] }].)

My goal is to create a .doc file from it, the old-school MS Word '97-'03 variety, using only the command line or software libraries (JavaScript, Go, or Ruby preferred, but I'll learn Python if necessary), no GUI. Obviously each h1 would be a Heading 1 and each p would be a paragraph following it.

soffice --convert-to doc:"MS Word 97" mybook.html works a treat, but... it doesn't support generating a table of contents, and I need that.

What should I do here? I asked on the #libreoffice IRC channel, and a helpful person recommended using PyUNO to script this -- but researching it, that seems a fairly big project, learning the tooling, the language, the OpenOffice API. I'm willing to put the time in if it's necessary, but I thought I'd ask here and get some advice just in case there's another simpler option.

Thanks to anyone who can offer advice!

edit retag flag offensive close merge delete

Comments

Heh, a "helpful person" here ;-)

You didn't mention (or I had missed) that it's the product of your script. You might want to look towards FODT files, which are plain XML files analogous to ODT. If you would generate them from your script instead of HTML, with ToC built-in, you'd only have to convert those FODT to DOC usually. You might get the idea what to generate, if you save a simple file into this format and check the resulting XML.

Mike Kaganski gravatar imageMike Kaganski ( 2018-07-03 16:31:32 +0100 )edit