Can I recover uncorrupted macros from a corrupt spreadsheet?

I have a corrupt Calc 7.5 spreadsheet on Linux (Read Error. / Format error discovered in the file in sub-document content.xml at 2,75207912(row,col)., 4.2 MiB, 262,045 cells). Ideally I would like a tool that that recovers a whole lot of my document, but I suspect that is not possible, so I am asking about recovering the macros. Since content.xml unpacks to 76,183,848 bytes, it sounds as though the error is almost at the end of a 72 MiB file, perhaps making it rather hard to correct.

I was able to open the document as an archive (with Ark), and extract the files. It seems that there the BASIC macros are probably not corrupted, and I would like to import them into a somewhat older backup of the document. The macros are in the same library (Standard) and the same modules in both documents.
The extracted files are, however, in XML, and if I try to import them I end up with XML in my macro document. I would, of course, work on copies of the documents.

I can think of these approaches:

  • Use an existing tool to import them directly from XML.
    • This sounds ideal, but seems impossible.
  • Use an existing tool to convert them to BASIC text, and import those into the IDE, replacing the entire contents of each current module.
    • This sounds like the most reliable plausible approach, if there is such a tool.
  • Convert them to BASIC myself and then Import them: It seems one has to remove a little fairly redundant XML packaging and search-replace a limited number of XML constants, e.g. ' → '.
    • This sounds fairly doable, perhaps with a sed script, but I am a little concerned they may be some gotchas.
  • Extract the backup document as well, replace the macros directory (Basic/Standard) with the newer version, and pack everything into a new archive.
    • This also sounds doable, and may offer other opportunities to recover more, but I do not know what format to pack it in. I could perhaps find out by asking Ark; I see if say compression method = Deflate and it shows MDS, SHA-1 and SHA-256 hashes.
  • Use Ark (or another archive manager) to replace the files in the archive.
    • I am not sure whether this works, but it sounds perhaps a little better than the previous.

What is most likely to succeed, of these or any other approach?

I don’t know a “tool” for this, but you are not the first. So you may try to repair a copy of your file, before you try to rescue only parts:

Thanks, I shall take a look.

I could try uploading the file, as in the issue you refer to, but it is 4MiB and content.xml is 72.7 MiB, making it quite hard to process with many editors (Kate says the longest line is so-and-so long, and offers to reopen it with a temporarily raised limit, but does not succeed in showing me the line).

I have recovered my macros (which represented most of the effort recently put into the sheet) by applying the following sed commands to all modules (4) and importing them (manually) into my backup document:

sed '
s|^[<]script:module.*>||
s|^[<].*||
s|^</script:module>$||

s|&quot;|"|g
s|&lt;|<|g
s|&gt;|>|g

'							\
	|	sed "
s|&apos;|'|g
s|&amp;|\&|g
"

This leaves 2 blank lines at the start and Âż1 at the end?, but that does not bother me!
No doubt in other cases other XML constants may appear (so my bash script displays any remaining ampersands).
I am not sure why the output ampersand has to be quoted.
There are two sed commands because quoting the single quote inside single quotes did not seem to work.

1 Like