Switching from Microsoft Word 97 - 2003 to OpenDocument Text

(Please note I’ve had to guess that the tag “writer x26381” above applies)
I’m new to LIbreOffice and recently downloaded 7.3. I am really pleased that it is a perfect match for my old MS Word. (I’m so pleased with the 7.3 that I’ve made a donation and will make another soon.)

Word has served me so well over the years to build-up a huge archive of texts which I am steadily
updating to 7.3. However, there is one thing that has surprised me so much about 7.3 that it causes some unease. Texts I have updated from MS to 7.3 use far less disc space. On the face of it, it’s probably a good thing if it saves on disc space, but I’m uneasy that text might be lost in the process, even though there appear to be no signs at present. However, I’ve just updated a massive foreign language vocabulary of some 2,260 Kb and the new ODT file is just 180 Kb. Surely that can’t be right?
I’d be most grateful for your advice and/ or suggestions.

1 Like

In addition to the explanations have a look at this [Tutorial] Differences between Microsoft and AOO/LO files - Cheers

Dear Grantler,

Thank you very much for your reply. I will, indeed, take a look at
that tutorial, a new one to me. Thanks again,

Best regards,

Eric

Many years ago (15-20) I was tasked with converting a bunch of M$ Word documents into HTML format so they could be published on our group’s website. I used an automated conversion tool, provided by M$, that did the bulk of the work, but when I began investigating the HTML tags I discovered that I could clean up the final file by eliminating 70-80% of the tags that were generated by the automated conversion.

I don't know how much of that garbage was generated by the conversion, or how much had already been inside the Word file. But if it was in the original document then that would explain a lot of what you have seen in file size reduction.

Dear Mr Zimmerman,

Thank you very much for sharing your expertise with me so promptly.

I may be new to LO, and relatively new to the internet, but I find what
you have written fascinating. Thanks again.

I wish you a pleasant weekend.

Eric

It is normal when DOC to ODT makes files much smaller (note that it doesn’t hold for DOCX to ODT). DOC is a proprietary binary format, that has almost no compression. OTOH, both ODT and DOCX are compressed (ZIP) formats, and thus for textual content, they offer very substantial size reduction.

I do not claim that this means that everything was converted perfectly. Keep in mind that no conversion of this kind may be perfect; our import filters are imperfect both because of bugs, incomplete support, and also incompatibilities in document models. The first two items are improved gradually (the third one is almost impossible to eliminate); that’s why I always advocate to not convert everything you have into native format, but only do that for those specific documents that you need at this moment. All the other documents should stay in their original formats, until you need them - and at that point, you maybe will use an updated LibreOffice, with improved import filter, allowing you to loose less on conversion.

2 Likes

Dear Mr Kaganski,

Thank you so much for sharing your expertise with me so promptly.
It comes in the nick of time, before I intended to convert my MS Word
documents to ODT wholesale.

Thanks to you, I now have a far better understanding.

I wish you a pleasant weekend.

However, I’ve just updated a massive foreign language vocabulary of some 2,260 Kb and the new ODT file is just 180 Kb. Surely that can’t be right?

And what should be wrong if everything is available?
Each format has its properties when saving.


My Transition from MS-Office to LibreOffice


If you work with Writer you should always save in ODT format.

Professional text composition with Writer


If you have further questions, please post back here. Thank you.

Dear Hrbrgr,

Thank you very much for your email. It has helped me understand
better this, for me, new technology.

I wish you a pleasant weekend.

1 Like

Several factors may contribute to this size decrease.

  • you saved your files as .odt
    This is the recommended practice after conversion. Applications are at their best when they are presented files in native format. This eliminates the need to convert from and to the foreign format and also spares the translation approximations which have a cumulative effect over time.
    This also allows to take benefit of all the higher-level abstract features in Writer.

  • you kept your files as .doc(x)
    DOC (not sure about DOCX because I dropped M$ Word usage a long long time ago) had a feature where edit history was kept inside the file, i.e. deleted and changed text was retained in the file. Though there was no user-accessible command to revert to a previous version, you could extract it with scavenging tools. This edit history occupies a substantial volume when a file is heavily edited. Writer has no such feature and wipes out the edit history when saving because it doesn’t know how to handle it.

You can reduce even more the file disk size by fully styling the document(s). Words hardly knows of paragraph styles (and in a less sophisticated way than Writer) and everything else is direct formatting, notably was would be under character and page styles.

This results in many (zillions) single-use character, page and list styles which uselessly occupy space.

To get an idea about styles, read the Writer Guide and practice on examples before reformatting your documents.

Dear ajlittoz,

Thank you very much for your email, sharing your expertise with me.

It is most interesting and helpful and will be kept for future reference.

I wish you a pleasant weekend.