Spreadsheet Error/Recovery

I attempted to open a LibreOffice spreadsheet and I received the following error message:

Read error.
Format error discovered in the file in sub-document content.xml at 1,3521076(row,col).

I found this (and other similar) guidance here, in the LibreOffice forum:
https://ask.libreoffice.org/t/i-can-not-open-my-spreadsheet-it-comes-up-with-this-error-read-error-format-error-discovered-in-the-file-in-sub-document-content-xml-at-2-72157-row-col/8576

So I made a copy of the spreadsheet and I changed the file extension to .zip. When I opened the content.xml file in the archive manager, I received the following error message:

This page contains the following errors:
error on line 1 at column 3521042: EntityRef: expecting ā€˜;’

When I go to line 1, column 3521042 this is what I see:
printthread.php?t=5866&pp=10&page=405ā€ xlink:type=ā€simpleā€>To get a reliable BIOS mod go

Column 3521042 falls between the ā€œpā€ and the ā€œ=ā€ as highlighted here:
printthread.php?t=5866&p**p=**10&page=405ā€ xlink:type=ā€simpleā€>To get a reliable BIOS mod go

So, being a xml noob, I don’t know if I am should insert a ā€œ;ā€ between the ā€œpā€ and the ā€œ=ā€ or delete the second ā€œpā€, or if I am completely misunderstanding the error message and the aforementioned guidance. Perhaps someone who is familiar with xml can spot the syntax error and offer a suggestion?

Thanks in advance!

The error will be at an earlier spot.
Here the xml-parser notice, something is wrong. I guess it expects a ; to close a character entity like ö for ƶ but I (not the xml-parser) can see the whole place seems to be part of a hyperlink/web-address maybe at an earlier position some quoting is missing, wich results in the Parser reading this as xml or even more is lost…

See the standalone & characters there. They are invalid in XML; this needs to be replaced with

printthread.php?t=5866&pp=10&page=405ā€ xlink:type=ā€simpleā€>To get a reliable BIOS mod go

It would be great if you upload the problematic file here for inspection.

And also note: your archive-manager ? gives another location than you quoted earlier in your question:
3521076(row,col) and
3521042

… which is perfectly OK. The error message that LibreOffice gives you has the peculiarity that the position is reported to which the file stream was read into a buffer; so the exact position is only known to be earlier than the reported one. The ā€œarchive-managerā€ (in fact, the program associated to XML on your system; likely a browser) tells more correct position, that both is in agreement with the problem reported, and with the position reported by LibreOffice.

2 Likes

Further analysis.

The error looks like what both Google Chrome, and MS Edge emits.

For a test, I have created (originally using LibreOffice) this sample document with the intentionally (manually!) modified content.xml:

linkWithStandaloneAmp.ods (7.9 KB)

This is what MS Edge shows:

Note how it reports the problem in line 2! This is important, since documents created using LibreOffice always have XML declaration in the first line of the document, and the rest on the second.

I have also tested ODS created using MS Excel 2016; and I can tell the Excel both produces the correct XML for such URLs, and has the two-line XML layout (XML declaration on the first line, the rest on the second) as LibreOffice-generated files.

So the file that you see is likely generated by some third-party software, that has a bug; and it would be nice if you identified the source of such files, and informed the respective team (file a bug to them).

A wee bit more information:
I’m using Manjaro Budgie Linux, so the archive manager is GNOME Archive Manager v3.42.0.
I am using Code OSS to find the line/column reference point.
As I already mentioned, I’m not a developer and I don’t know XML syntax. Does Code OSS have an extension that you can recommend that might help to identify the XML syntax problem?

My content.xml file is 7.2 MB, so is it OK to upload such a large file to this forum? Also, please note that this is my first post to the forum, so I’m not familiar with the etiquette here. I don’t subscribe to any file sharing services. The maximum upload size for pastebin is 0.5 MB. I don’t know what the maximum upload size is for hastebin.

The spreadsheet content is a long list of ThinkPad laptop model numbers and potential laptop upgrade information, so I don’t think that there is any real concern about privacy/personal information.

Methodology:
Yes, I did notice that there are two different error messages, with two different column references. I assumed (perhaps incorrectly) that the second was specifically referencing the content.xml file, whereas the first was a reference to the entire spreadsheet. The first is generated by LibreOffice, itself, when I attempt to open the spreadsheet. The second is generated by opening the content.xml file with the Opera browser. Once I had the second reference from the Opera browser, I then opened content.xml in Code OSS to find the column reference location. I then copy/pasted an excerpt from the content.xml file from Code OSS into my original post.

I’m groping in the dark, here, so please feel free to criticize this methodology and propose an alternate approach.

Note that when I open content.xml in Firefox I receive the following error, but with the same row/column reference as provided by Opera, followed by a long stream of content:

XML Parsing Error: not well-formed
Location: file:///home/dwhite/.cache/.fr-IexSPP/content.xml
Line Number 1, Column 3521042:
<?xml version="1.0" encoding="UTF-8"?><office:document-content xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0"

IIRC the limit is around 4 MB, but please don’t upload the XML, please upload the ODS.

As requested, here is the original copy of the spreadsheet that I made.

Thanks for your interest and assistance with my dilemma!

Oops, I think that my trust level prevented the file upload. I’ll try again.

1 Like

OK, never tried this service before, but here’s the link to the .ODS on Google Drive:

https://drive.google.com/file/d/1eKzD0oR52KIeSCM26h49qC9b9XJPDjQF/view?usp=sharing
1 Like

Ah, ā€œONLYOFFICE/7.1.1.57ā€. Fun.

Anyway, here it is:
Laptop_Upgrades (copy)-fixed.ods (669.9 KB)

Only Office?!?!?!
What the hell is that? I’ve never even heard of Only Office.

I don’t recall ever using Only Office; I’ve been a long term Open Office and then a LibreOffice user. Decades ago I used Lotus 123, but that was waaaay back when OS/2 was still a thing. That’s not to say that I haven’t saved Open Office and LibreOffice spreadsheets in a Microsoft format, before sharing them with Windows folks.

Could I have done something stupid that would have caused this problem, or should I be worried about the drives in my NAS getting ready to crap the bed?

Anywho … the file opens now and I can’t thank you enough for your help!
You will receive total consciousness for your efforts!
https://www.youtube.com/watch?v=X48G7Y0VWW4

That may be that.

IIRC, some solutions like OwnCloud/NextCloud offered OnlyOffice as their integrated online editing solution. Maybe you had edited your file using web interface once, before downloading; that could explain the generator software and the bug.

1 Like

Third attempt to upload a file … it doesn’t appear to work.
The .ODS file is only 684 kB.

I click on ā€œUpload Fileā€ it allows me to select the file, click on ā€œOpenā€ and then nothing. It doesn’t appear to be uploading, nor attaching to my post.