Odt file is now corrupted - guidance needed to repair the content.xml file

davidcorkindale · June 23, 2025, 6:52pm

Hi,
I had a single .odt text file produced in Libre Office Writer (no back-up !).
When I tried to open it today I got the following Error message:
Format error discovered in the file in sub-document content.xml at 2,59836 (row,col)
I have already extracted the file content.xml but now I need to know what I need to do with it, or can do with it, in order to repair it.
I have opened it in different xml editors and (2, 59836) arrives at a slightly different place within the ‘style’ tag but the issue seems to be something to do with a missing ‘equals sign’ at ofoperties offCrsid=“004ab33a”:
</style:style><style:style style:name=“T19” style:family=“text”><style:text-properties officeooo:rsid=“00b34e0716596” officeooo:paragraph-rsid=“00716596” tyleo:textne-pdfont-size=“10pt” ofoperties offCrsid=“004ab33a” tyle:ai0b34e>:style:name=“T10” style:family=“text” style:text-underline-width=“auto” style:tet-siz"/>.

(1) Can I just delete that whole section between the start at </style and the end at /> ?
(2) Otherwise, can anyone suggest what may be wrong with: ofoperties offCrsid=“004ab33a”
(3) Can I just delete ALL the styles in the whole xml file and replace them with one simple style that will at least get my basic text back? (ie. because I suspect that even if I manage to repair this particular error, then there will still be another pile of errors to fix after it).

Thanks in advance.
David

Villeroy · June 23, 2025, 7:09pm

Try out with a copy of that file.

mikekaganski · June 23, 2025, 7:46pm

These figures are not telling you reliable info. In fact, they are just how far has the stream been read (using a buffer of a fixed size), when, while parsing its contents, a format error was discovered.

In the best case, the error is somewhere in the last ~32000 bytes before the reported point. In the worst case, it may be even in the very beginning. Consider this XML:

<xml>
  <wrong_tag_name>
    ... A very long data, maybe several megabytes long
  </correct_tag_name>
  ... something else ...
</xml>

The chance is, that the problem will only be discovered, when the reader reaches the </correct_tag_name>, which is very far away from the actual problem location. And even that tag location will not be reported to you, because very likely, that the reported position will be somewhere in the “… something else …”.

Of course, you can do it using trial and error; you may try to remove whole blocks, and find out, removal of which big block makes it OK; then return that big block, and focus on removal of smaller blocks from inside, and so on. Especially that you are lucky, and the reported position is not far from start; you only have ~60 000 bytes to check (line 1 is ~empty, only having XML declaration; and ~all the data is usually in a single line 2).

davidcorkindale · June 23, 2025, 7:56pm

Thank you to Villeroy and mikekaganski for your replies so far.
I think it will have to be trial and error and if any particular attempt works then I will report back. Incidentally, Lines 1 and 2 are as you suggest, however, the complete file is over 200,000 bytes so there could well be many ‘xml errors’ to address.