Problems with 10.7 MB DOCX file

1rst problem

I have this 10.7 MB DOCX file that takes too long to open in Writer (portable). At a certain point the program freezes for a while, and if I click on it I get that “not responding” warning from Windows, and the only way to stop everything is to end the task via Task Manager.

2nd problem

After that file is finally open in Writer many minutes later, if I click on the File menu, it takes a little while to open, when normally it opens immediately. At this point I notice that some graphic elements in the file seem to be moving, as if the program hasn’t yet “rendered” it completely. Then I click the Export as option, it takes a while to open, then Export as PDF… and it also takes a little while to open.

After I start the conversion process it takes a very long time, I get that “not responding” warning, then nothing. I have to close Writer from Task Manager.

For comparison, I opened (and converted to PDF) a 22.2 MB DOCX file normally, that is, quickly without any delay whatsoever.

So I think the problem is that 10.7 MB DOCX file itself, its contents or whatever. So much so that I tried to convert it to PDF online on two different sites and both were able to upload the file but not to complete the conversion.

Would you please help me? May I upload the file so you can check it? TIA.

Info

Windows 10 Pro 21H1 19043.1266
Windows Feature Experience Pack 120.2212.3920.0
12 GB RAM

Version: 7.2.1.2 (x86) / LibreOffice Community
Build ID: 87b77fad49947c1441b67c559c339af8f3517e22
CPU threads: 8; OS: Windows 10.0 Build 19043; UI render: Skia/Raster; VCL: win
Locale: pt-BR (pt_BR); UI: pt-BR
Calc: threaded

May I upload the file so you can check it?

You can try it, but I’m afraid the file is too big to upload here.

So alternatively you should upload the file to a cloud somewhere and post the link here.


PS:
You did not specify how the file was created.
Did you get it from someone?
Did you create it yourself a long time ago?

Possible problems with DOCX files:

  • There could be very many and large images.
  • When the file was created, a lot of changes were made and it has a large number of styles that need to be converted in LibreOffice.

Have you tried to save the file as an ODT file after opening it?

Sorry, I didn’t express myself well. I meant uploading it to the cloud then share the link here. My question about uploading the file had more to do with someone getting it to check what is wrong.

I downloaded the file in 2019 from a site whose name I can’t remember now. It’s a kind of dictionary, it doesn’t have images, just a lot of hyperlinks to content inside itself (from an index to the definitions).

BTW, I remember opening it with another word processor back then and if I clicked on these hyperlinks I was sent to the definitions. But now it doesn’t happen with Writer.

Here you are:

My question about uploading the file had more to do with someone getting it to check what is wrong.
Here you are:

Yes, that’s fine.

However, nothing can be selected at your link. There are only two logos visible, the rest is white page. You can’t click anywhere?


…just a lot of hyperlinks to content inside itself (from an index to the definitions).

This could be a problem, why the file takes a long time to open. If there are some links missing, LibreOffice is waiting for feedback (which never comes)?

I tried to upload that file four times to Google Drive but there was a persistent error, so I tried OneDrive and that’s the link I copied from the Share option there after the upload was completed. After your warning above I tried to open the file with the online version of Word and also got an error.

I thought of compressing the file to upload to Google Drive as a last resort and this time I was successful. I hope you can download the archive.

FYI I installed two portable word processors to see what would happen if I tried to open that file with them. The results:

Jarte warning

Abiword warning
This is Brazilian Portuguese and says “Error importing file”.

I thought this Office 2007 Compatibility Pack perhaps may lead us to some solution. What do you think?

I have been able to download the file.

It took about 20 minutes.

3156 pages are displayed in the status bar.

You are talking about:

At this point I notice that some graphic elements in the file seem to be moving,…

I call it “fidgeting” for once.

After another 10 minutes is displayed in the status bar:
167,844 words and 1,062,341 characters.

I have the task manager open all the time.

After a total of about 60 minutes, the cursor flashes in the document and the CPU load shows 0% for LibreOffice.

I was then able to save the file as ODT (about 4 minutes).

At this point, only 2,052 page, 325,633 words and 1,882,632 characters are displayed in the status bar.

The ODT takes about 50 minutes to open, until CPU load = 0%.

On the top line in the document is written that the document was created with a third-party program:

CHM2PDF

It is not obvious why the file takes so long to open, except that it is very large.

It is not possible to work with either DOCX or ODT.

From my side I can’t help you, sorry.

Nevertheless good luck.


EDIT- @Hrbrgr 20211004-21.40 MESZ
And now, you can download the TXT file from my Dropbox folder:
Dropbox - English-idioms-sayings-and-slang - Simplify your life

EDIT- @Hrbrgr 20211005-10.00 MESZ
Beside the text file you findes under this link now also a PDF file, which can be opened likewise (in the Acrobat reader) very fast and also without problem a scrolling makes possible. However, the PDF has become very large with 266 MB.

I hope you can cope with these possibilities now.

The files will be removed from 2021-10-06 - 12.00 MESZ.


At the end of the document you will find the author Wayne Magnuson (?).
I found the following website:
http://esl-bits.net/idioms/

It took me far less time than @Hrbrgr to open the file (Fedora 34, Linux 5.14.9, KDE Plasma) but on the order of tens of minutes.

2059 pages, 167 844 words, 1 062 341 characters

Any action on the file (e.g. a simple click in one area or changing zoom ratio) takes minutes to complete, but I was able to read “This document is created with the unregistered version of CHM2PDF Pilot”.

From the Navigator, the file structure puts too much strain on Writer:

  • 124 tables
  • 4762+1 sections
    I think this is the critical factor.
  • 1568 page styles + Default Page Style
  • many graphics objects (yellow backgrounds) which are not even listed in the Navigator!

The document combines a very bad global structure with conversion from .docx.

From what I can analyse, formatting is equivalent to direct formatting with every page handled specifically. Each page is a 2-column section where text sync between column is done with empty paragraphs.

Since M$ fonts are not installed on my Linux box, substitution fonts are used but they have not exactly the same metrics. Consequently, the yellow background (brought in by inserted graphics) are offset.

The KWIC index part looks like there is an hyperlink under the “citation” but there is none. The line is not styled Internet Link but is similarly direct formatted. There is no associated cross-reference.

IMHO, if you really intend to use this file, you should consider rebuilding it. Save it plain text .txt to get rid of all the rubbish caused by .docx conversion AND buggy structure. Unfortunately, due to this buggy structure, you’ll probably get a mess, though it can probably split on page basis.

Drop the idea of section. It does not fit the content of the file. The nearest notion here would be a table. A single 2-column table with ~20k rows should probably your base object. Ad hoc paragraph styles (one for the term with background) and one for the KWIC “citations” are a good start.

This should make your file manageable.

As a lesson, automatic converters which do so bad formatting job are tolerable for small files but not for such giant documents.

What is weird in this case is that it used to open quickly back then (2019).

I thank you very much for your attention and patience, and if you allow me I have some more questions:

  • Could it be that the DOCX file is corrupted somehow? If so, is there a way I can check that and fix it? Preferably some software I can download and install, if any, considering the problems I had uploading that file?

  • My system is 64-bit and the portable LO I have is 32-bit. Could this relate to the issue in any way?

BTW, I wonder why it doesn’t install the 64-bit version here.

In the meantime I’ll try to find that file online again. TIA.

I assume that the comment entries from you overlapped from @ajlittoz.

I can only agree with @ajlittoz 's analysis.
The essential facts have been confirmed.

I can also only recommend the further procedure as described by @ajlittoz (make it new).

Unless you find the CHM2PDF program and try it with that.

I don’t think that if you find the file on the Internet and download it again, there will be a better result.

See my new edited comment [20211005-10.00 MESZ] :
Problems with 10.7 MB DOCX file - #7 by Hrbrgr