Why does LibreOffice produce such bloated .odt files?

Hi @Uglyface200, The comparison is very interesting. Could you please attach the example files that you describe above so that we could see how they differ?

Thanks!

I don’t know how to attach files.

Hi @Uglyface200, You have enough karma to upload files, so that shouldn’t be an issue :slight_smile:

When creating a new Answer (or editing an existing one), please click the paperclip icon in the toolbar to upload.

Cheers!

I don’t have enough new info to merit an “answer”, so I’ll post this observation as a comment. I had a .docx file that I created in Word, brought home, and opened in LibreOffice. I ended up with a 4.5mB file, and LibreOffice was crashing/recovering frequently when I would go into the Formula Editor (doing math homework). When I opened a new Writer document and copy-pasted the contents from the Word document, the resulting file was 500kB.

I took a look at the odt files you provided. To see it yourself, rename these to .zip files and unpack.

  1. Writer ODT contains 8.1 KB preview image of document. Word ODT does not. Please note that this is a feature (this image is what you might see instead of Writer icon in file manager).

This seems to leave with only 6.5 KB overhead. When I deleted Thumbnails directory from the zip file, I got:
13.9 KB Writer zip vs 12.3 KB Word zip, which is 1.6 KB difference.

(a fun note, with completely white first page the same document size was 14.9 KB)

  1. other two notable file size differences are
    styles.xml 14.4 vs 8.1 KB
    settings.xml 9.9 vs 1.4 KB

styles.xml has some extra styles for graphics, tables, even if these are not used in the document. The largest chunk is outline settings (how 10 levels of headings are positioned and numbered).

settings.xml contains stuff like printing settings, where your cursor was when you saved and stuff. Word’s settings file is empty, it contains nothing useful at all.

  1. the content.xml file in Writer ODT is actually smaller, 29.5 KB vs 53.5 KB. This means that with longer documents Writer ODT will actually be smaller than Word ODT files. Although these need to be pretty long ones (especially when you need to overcome the thumbnail size). In other words, Writer ODT’s base size is larger, and it is so mostly due to included Thumbnail image.

I don’t like the thumbnail feature. It’s not very useful, because it’s only ever seen at small sizes, sizes at which the first pages of most files don’t look very different. Also, I suppose this partially explains why Word can’t directly open the ODT files I create with Writer.

It’s usefulness is questionable, depends on your documents. But as for Word’s inability to open ODT files, this is just bad engineering on their side (or worse if it is intentional). But for sure the reason is not previews, an extra file inside a ZIP cannot break anything, if it’s not used.

Is it possible to turn the thumbnail feature off?

@benny, not that I know of. Nowadays pretty much nobody cares about file sizes below few MBs. If you archive a lot of ODT files you could remove the thumbnail for yourself.

Also fun fact: unzipped (e.g TAR-ed) ODT would actually compress better if you compressed more than one file like this.

I thought I’d mention an interesting postscript to this question. I had a ten-megabyte .docx file of a book that I was working on. I saved it in .odt format from Word 2010, and the file size absolutely plummeted to 645 kilobytes! That is a 629% decrease! So, by all means, use the .odt format, even if it is marginally inefficient at small sizes.

Off the top of my head, I can think of at least two reasons why the .docx file is so big: 1) it probably keeps a revision history even though this has not been requested (I have experienced this many times with .docx files sent by colleagues and sometimes, obsolete and deleted content shows up at the end of the document even though it has been deleted) 2) OOXML’s tag names are shorter, but harder to compress efficiently because much more numerous (see Rob Weir’s blog for explanations of that).

Have you analysed what happen with different sizes?.

Please take a look to this thread.

There are no images involved in either file. And performance is equal at all sizes.

This is not an answer to the question.

At the request of qubit, In Microsoft Word, I created a document of the Mozilla Public License. I saved the file in .odt and .docx format.

Then I opened the .odt file with Writer and saved it as a new file, and it ballooned from 13kb to 22kb. Just for the heck of it, I opened the .docx file with LibreOffice and saved it as a new file. It shrank from 22kb to 10kb!

The involved .odt files are attached (I wanted to attach the others, but why on Earth does this site not allow .docx files?). No content was changed after the initial save.

Word ODT.odt

Writer ODT.odt

@Uglyface200, Good question about docx uploads. Let me run the idea past the other admins and see if there’s a specific reason why that support isn’t enabled.

Regarding the comparison of file sizes, lemme resolve the file-upload issue first, then we can get back to that :slight_smile:

The fair way to comparison is create new document in both Office with same content. Open .odt document (that generate by MSO-2010) in LO-4 then save new .odt is unfair. Because some generic code from MSO are embed to file.
I’ve no MSO-2010, so I can’t see the different result between create new document and save-as .odt document from MSO-2010.

Here’s a suggestion on how you could present the results:

So:

Hi @Uglyface200,

I’m not sure if/when we can get docx file upload available on this site. Looks like things are pretty busy in IT land for the foreseeable future. I’m pretty sure that we can upload docx files at bugzilla, so here are couple of options:

  1. File an enhancement request bug and attach some example files. I’d concentrate on the same file type first – so if (given the same input) MS-Word creates smaller ODT files than LO-Writer, then report that.

  2. Punt on this question for a while, until we have some time to clear up our IT work queue.

Thanks!