Filesize doubled when saving odt file from 5.x version in 7.1 version - strange PNGs in archive

Hi, having used old version 5.x and documents with images in it. Now I installed 7.1.3.

Opening an old document, do not do any changes, just save > and filesize is doubled.

I found, that this is because of the embeded images. My workflow: Take snapshots using Windows Snipping tool, Paste into MSPaint and then use paste Special > as GDI (this one resulted in smaller file sizes than just pasting as Bitmap, but this is only available, if first going into MSPaint)

I wonder, what happened with the new version, why does it just double document size.

I tried to investigate more on this. By opening the .odt files with 7zip, I found, that the original version just stores .svm files in the Pictures subdirectory.

Now opening the document in Writer 7.1.3 and just saving it.

Looking into the new, double sized file with 7zip, I can see, that in addition to the .svm files also .png files get stored in the archive, so each image exists in the archive in 2 variants, consuming double storage. So at least, it is good to see, that this does not seem to have to do anything with compression.

Question is: how to avoid that? Will experiment and report my results…

Edit: deleting either svm or png files from the archive does destroy the document. Both types are referenced in the content.xml file and opening fails after deletion.

Now created a completely new document in 7.1.3. Added an image via paste special > GDI. Saved and inspected with 7zip:

This one only contains the .svm - there’s no .png in that case.

Opening the file again in writer, doing no change at all, just save again.

Open in 7zip and the .png version is back again.

So, why does Version 7.1.3 create that png file when saving the file for the second time and from then on, keeps this one. Note: both files are written with same timestamp, so it is not the case, that the png in some way might be a previous version backup or such.

EDIT: changed subject to include the PNG finding

I cannot reproduce using Version: 7.1.3.2 (x64) / LibreOffice Community
Build ID: 47f78053abe362b9384784d31a6e56f8511eb1c1
CPU threads: 12; OS: Windows 10.0 Build 19042; UI render: Skia/Raster; VCL: win
Locale: en-US (ru_RU); UI: en-US
Calc: threaded.

I paste something to MS Paint, then copy there and paste special as GDI to Writer. The document indeed contain only .svm on save - as you write. But opening it and saving again does not add any other copies of the image.

Something with your settings? with the document? could you attach a sample only with .svm to test?

Next Step:

Create a new document and paste image, this time not using paste special > GDI, but instead just pasting as Bitmap.

Result: No more .svm file is present in the archive, first save adds in the .png only, and on subsequent saves, this .png keeps to be saved, getting completely rid of the .svm file version.

In some way, I need to get rid of at least one of the filetypes while being able to store documents without reappearance of the other one, and .svm seems to be the candidate of choice.

Actually, this would mean, that I have to recreate all my documents, trying to export or capture all images again and paste them as bitmap only, thus getting rid of the .svm versions, which reduces filesize as well. Would mean many hours of stupid cut and paste work.

Anyone with an idea, of how to remove the .svm files from the archive without breaking the .odt structure? Will need to get into the xml referencing these…

Hi Mike, thanks for replying. I will try to upload an example document. Unfortunately, I have to leave now for some hours, but will reply, when coming back.

I created some documents now. First of all, it is strange, but I am not able to create an ‘svm only’ version with 7.1.3 any more. Creating a document and using paste as GDI immediately results in a version, that contains both, .svm and .png. Doc1-pasted-as-GDI.odt

Creating a document and pasting as bitmap results in a file only containing .png - as I found before. Doc2-pasted-as-Bitmap.odt

As I can’t reproduce a file with svm only in 7.1.3 any more, here’s an old one: this only contains svm files. Open this in Writer 7.1.3 and just save it. Size gets doubled because both, png and svm are now stored. OLD-doc-only-with-svm-from-v5x.odt

I believe, I did find the root cause for my problem of doubled filesize. This one seams to lead into the right direction. In LibreOffice 6 Release notes, the following is found:

Metafiles which were previously saved in the internal SVM (Star View Metafile) format are now accompanied by a PNG fallback graphic. This makes it easier for other ODF readers to display the graphics.

That’s what I observe. All my documents had been written in LO 5.x, with images pasted as GDI. Now opening and saving in 7.1 adds that fallback graphic, resulting in doubled filesize.

So it looks like it’s a ‘works as designed’ issue and I will need to go and recreate all images as bitmap paste to have PNG versions saved only - very bad for me, but looks like my only option here.

Actually it should be a bitmap, because I captured a screen. My understanding of the overall process is not really good, but to me GDI is kind of an abstraction layer (Device Independent) - and it can handle Vector graphics and Pixel graphics. As you say, it looks like in case of my screenshots, these are wrapped into that format. It is just the case, that LO5 did keep the SVG only, while since LO6, that PNG copy is added to the file size. I did work with LO5 and made quite some documents with it - my bad, that I decided to paste in as GDI. That’s something I found long time ago with MS Word: using GDI produced smaller files than Bitmap paste - and I got used to it.

Until now - when GDI is handled by storing 2 versions of the image with no option to switch off the PNG fallback unfortunately.

Another thing I found: It is not just the pure filesize, but even more the save time that is annoying with many images in documents. So my current document takes more than 15 seconds for a 23MB file. Now I did a quick test on a powerful machine, 12 Cores and fast NVMe storage - and it still takes almost 10 seconds.

Looking into the archive shows, that ALL images get saved with new names and timestamps. LO obiously seems to actually go though all the compression etc… for each image when saving, regardless whether it has been touched or not.

Wouldn’t it be a good idea to check, if images have been touched and optimizing the save process to only rewrite changed parts?

Actually, I am now considering using images as links only - with the drawback, that I have to keep directories to store these and have more work keeping things in sync. Again, lot of work to extract images and relink (I found, that there seems to be some addon available for this)

Just wanted to report some results: Document on 12MB under LO5 > now 23MB under LO7 with > 15 seconds save time as starting point.

As said, now replaced all 140 images in this 90 pages document by links. Found a quite suitable workflow to do this within one hour manually, which is acceptable.

Result: Document size 120K only, and about 13MB PNG files in a subdirectory.

The result is rather disappointing: Save time even on an SSD is still 7 seconds for that small file. Images in the subdirectory are not being written at all, they all keep their timestamps at least.

CPU ramps up extremely - so I really wonder, what is going on here to just produce these small xml files (content.xml is 660K) - looks like the XML libraries really produce endless overhead. I really had expected a much more remarkable effect.