I repaired a dreaded "Format error in content.xml" in ODS file on Macintosh

RogerB · April 26, 2022, 3:32pm

Recently, I began to get the dreaded, “Read Error. Format error discovered in the file in sub-document content.xml at 2,2209664(row,col),” with a current document, then backups, even one several weeks old, all without making any changes. Well, except when I made 2 weeks worth of changes and they went bye-bye.

After much wailing and gnashing of teeth, I zeroed in on an error within a single tab of my spreadsheet, on a single row with 5 cells of text and numbers. Unformatted text and numbers at that. Not being the best XML editor (I can spell it, that’s it), I did the best thing I could, I opened up a still functioning earlier spreadsheet with that section intact, downloaded a half dozen XML editors (Settling on Ximplify) and compared the two sections of code.

ZERO difference.

So I deleted the offending lines of code, but I still got errors. New ones. Whatever was wrong was moving around the entire file, I felt like Carl the groundskeeper trying to kill the gopher, a digital version of whack-a-mole, but without plastic explosives,

I quit chasing it and found the intact section of XML with the most current version of the tab I needed, copied and pasted it into the slightly older version of the file. Saved, zipped it all back up and renamed it from .zip to .ods. Bang, I was back where I wanted with just a few lines missing from the tab the error was, easily recreated. I then went back and deleted the rows where the bad data seemed to want to crop back up. And backed it up. Twice.

But this is more than a fixit or rant, I have some suggestions here.

A single row of 5 plaintext cells in my spreadsheet rendered the entire document unreadable and unrecoverable. Quite frankly, this is unacceptable, especially for a mature piece of software like LibreOffice. A single row of historical data out of thousands is expendable, I can live without knowing exactly what happened on December 2nd, 2020.
If an XML editor can find errors, so can LibreOffice. And if they can be found, the option to have them deleted (even at the cost of minor data loss) should be possible.
The “So sorry, so sad” attitude of the community regarding users that don’t have the knowledge to edit XML files has to go. I’ve worked in IT for over 25 years, COBOL, Assembler, and Pascal were a LONG time ago, and I don’t miss them. The vast majority of XML editors aren’t written with the n00b in mind, 2 rows of 2 million-plus columns are daunting.

There are a lot of really great people on here that bend over backward to help and I applaud them. Writing how-tos is an amazing and thankless task, please keep it up.

Hrbrgr · April 26, 2022, 3:50pm

Where is your question?

RogerB · April 26, 2022, 4:34pm

It’s a “how I fixed it” and a feature(s) request, not a question. Most of the questions regarding this issue are either very old, the topic is closed, or downright uninformative.

I can’t get into examples or specifics of exactly what I did because it involves my financial information.

Hrbrgr · April 26, 2022, 4:41pm

…feature(s) request…

Unfortunately, that is out of place here.
You are here on a question/answer page where users ask and users answer.

Please create a feature request on Bugzilla. Thank you very much.

How to Report Bugs (and feature requests) in LibreOffice

erAck · April 26, 2022, 4:46pm

Yes, and for that use an XML tool like xmllint to process and format the extracted from the zip content.xml and freshen the zip archive and reload the document in LibreOffice, so it can tell a better position instead of 2,2209664(row,col). e.g.

unzip document.ods content.xml
xmllint --format content.xml >new.xml
mv new.xml content.xml
zip -f document.ods content.xml

It could had been helpful if you made the exact content of that portion available so someone could take a look what exactly went wrong.

Anyway, glad you solved it.

mikekaganski · April 26, 2022, 5:16pm

Unfortunately, the 2,2209664(row,col) doesn’t show you even a close position to the actual error. It only tells you what was the last character read from the input stream into buffer (16K), at the moment when the XML error was detected by the parser - and in the best case, the error might be in the last 16K characters (already not a small chunk to check) - but there are cases where the error may be in a completely different chunk - e.g., there may be a start/end tag mismatch, in which case you need to check all the way back to find the unmatched tag; or - what is much more difficult to find - the error could be some invalid value of a property, which is impossible to detect using an XML editor’s automatic validation (when there’s no schema).

It’s wrong to imagine that it’s possible to “just drop 5 plaintext cells”. We are not reading cells at that point; we parse XML. And since version 5.4 (tdf#104718), we do try to resume after such errors - hoping that reading the good part could already be better than just erroring out (when it is possible at all). We restored this behavior, that was also in OOo - but we provide the warning, where OOo simply opened the corrupt file silently with only partial content.