odt/xml: Replace embedded image without lowriter

pbarill · April 24, 2020, 5:55pm

I want to update the image in a document by manipulating the data/XML under the hood (scripted approach because it’s the same task again and again…). So I have this in the file:

Pictures/1002B314000048A8000037DC206113FE1EF04F6E.emf

The emf file is generated somehow outside of libreoffice. Obviously I did not call it so. Now I generate a new one and want to update the document by replacing the old one (same dimensions and everything, just different content). Problem: It needs a “special” name (looks like a hash) which is calculated by libreoffice.

I can’t just dump a file with the same name as above nor call it Pictures/0.emf and then in content.xml link it to this file and update META-INF/manifest.xml accordingly. Doing so will prompt “File is broken, repair it?” Everything is fine once repaired (the proper cryptic filename gets there), but I want to skip that trivial repair part. The good news is that the emf file in the repaired file has the same sha1sum as that I generated, meaning that no conversion/modifications are implied.

Question: Any idea on how this filename is generated? Quickly checked md5, sha1: No, that’s not it.

I checked the standard. Really hard to find, and there is nothing in there.

I glanced at odfpy (python API) to add pictures. There’s a thing to insert a file in the zipped document, but it generates a random identifier, which is unlikely to behave correctly.

ajlittoz · April 24, 2020, 6:26pm

It all depends on what you’re doing with the document file.

If it is for your own use, insert images as links instead of embedding them (thus creating a unique hash to identify them). You might need to organise a bit your document directory: since the “logical” document is no longer a single file but a set composed of the text file plus the various image files, you would be better off creating a dedicated sub-directory to store the document file and its companion image files.

With such an organisation, you copy the images into this subdirectory. Whenever one of them is replaced, keeping the same name, the link is still valid and you get the update when you next open the document file.

With such an organisation, the “logical” document is the subdirectory and can be also sent by email (with or without compression, zip and others).

If you don’t copy the images, i.e. they are scattered all over you computer file system, you can’t easily email the document.

This change of organisation is IMHO simpler and safer than trying to tweak the XML.

To show the community your question has been answered, click the ✓ next to the correct answer, and “upvote” by clicking on the ^ arrow of any helpful answers. These are the mechanisms for communicating the quality of the Q&A on this site. Thanks!

In case you need clarification, edit your question (not an answer) or comment the relevant answer.

Nota: the edit only fixed typos, nothing new added.

pbarill · April 24, 2020, 7:02pm

Interesting alternative workflow. Once everything is done I need to export the document from odt to docx (not my choice) and have everything embedded. I gave it a try (using --headless mode) and it seems that the conversion takes care of embedding everything.

ajlittoz · April 24, 2020, 7:25pm

A variation on my scheme: instead of copying the files into the sub-directory, you can instead symlink them to the original. Thus changing the original is immediately reflected in the document (on next open or Tools>Update>Update all).

pbarill · April 24, 2020, 7:43pm

Sorry, I spoke too fast. I must have clicked something else the first time. Conversion does not embed anything, but the link seems to work after conversion, and it’s possible to “manually” break link to embed things later on.

mikekaganski · April 26, 2020, 10:12am

I can’t just dump a file with the same name as above nor call it Pictures/0.emf and then in content.xml link it to this file and update META-INF/manifest.xml accordingly. Doing so will prompt “File is broken, repair it?”

In fact, you can “just dump a file with the same name as above”. It works that way (removing the image file from Pictures/, then putting another image with the same name, without any changes elsewhere). The long name like “1002B314000048A8000037DC206113FE1EF04F6E” (which is the image hash) has no special meaning other than ensuring that we don’t store duplicating images needlessly. My suspicion is that your tooling somehow incorrectly repacks the zip package after the operation, e.g. not making sure that “mimetype” is the first file in the package and is not compressed (uses “store” method).