SAXException: [word/document.xml line 2]: Attribute w:eastAsiaTheme redefined

I have a long document I have been working on for the last year. When I tried to open it on Monday it came up with the following error message:
An error occurred during opening the file. This may be caused by incorrect file contents.
The error details are: SAXException: [word/document.xml line 2]: Attribute w:eastAsiaTheme redefined
Proceeding with import may cause data loss or corruption, and application may become unstable or crash.
Do you want to ignore the error and attempt to continue loading the file?
When I click to say I don’t want to continue, the following message comes up:
File format error found at
SAXParseException: ‘[word/document.xml line 2]: Attribute w:eastAsiaTheme redefined’, Stream ‘word/document.xml’, Line 2, Column 242204(row,col).
Looking on the forums I find I am not the only person faced with this problem. I’ve tried doing what’s suggested, insofar as I can understand the advice, but have not solved the problem.
Please help!

Woke in the middle of the night, thinking of other things I should have told you. The document was written using an earlier version of LibreOffice Writer. I’ve tried to find out which version, but haven’t been able to. As advised in one of the posts, I have now updated to 6.1. I have tried to Restore Previous Versions, but none are available. I only realised you needed to tick to activate this feature after the error message appeared. I have read other posts from individuals smitten with the same problem, including someone who suffered a similar disaster just before he was due to submit his thesis. I understand the problem is with xml, but that’s something I know nothing about. Is it possible to submit my corrupted file (when I’ve ignored the warning and opened the file, having saved other copies to hopefully resolve, I’m getting only 13 pages of the 950+ pages I was hoping to see), to someone who could correct this? Is it a bug? If so, does the community know how to resolve it? In case it makes a difference, I’m working in Windows 7, and saving in .docx format.

Did you also check [Solved] LibreOffice File format error found at SAXParse (View topic) • Apache OpenOffice Community Forum. If you are not familiar with zip, unzip and editing .xml files, you may need to find a trustful person doing that with your original file, since all solutions I’ve read about so far require a modification of the document.xml file stored in the zip (docx is a zip file).

An btw this question and issue is another reminder to only use native odt format in LibreOffice Writer.

Did you see [Tutorial] Fixing .docx files with SAXParse error ?

And please note that that tutorial is only useful in part of my comment there, unfortunately. See the regex there.

Dear Mike, I don’t understand your comment. Which comment is it you are referring to? And what is a regex?

I’ve just found out how to save my document as a .zip file, which enables me to view all the xml coding. I’m concerned that if I do this with my +900 page file, it will be so long, and finding the problem-causing glitch will be extremely difficult. Can I do a word search with the .xml version? Does ‘[word/document.xml line 2]’ help narrow things down? Or Column 242204(row,col)? Or could I just search for ‘w:eastAsiaTheme’, and is it just a question of removing that line, or do I need to ‘redefine’ it, and if so, how?

@Brazilnut: sorry for being unclear; I referred to the tutorial mrntioned by @anon73440385 just above; your document may be totally restored, without any loss, only in case of manual editing of the docx, otherwise some part of document might be lost.

In the mentioned tutorial, I wrote a comment about proper regular expression allowing to fix this error automatically.

Sorry to be a pest, but this is all very new to me. What do you mean by proper regular expression? Can I also refer you to the second paragraph I’ve this morning added to my initial enquiry, which may or may not help, and also my comment about xml editing I’ve added most recently. Any help very gratefully received!

I refer to this comment from the tutorial:

So please install Notepad++ to edit your word/document.xml unpacked from the archive; search for (\<[^>]+)([\w]+:[\w]+="[^"]+")([^>]+)\2, and replace with $1$2$3, making sure that regular expressions option is active in the replacement dialog.

Then put the edited xml back into the archive, replacing old file there, and rename the archive to .docx; then open it.

Or you may provide your document here, I may do that for you.

If you could that would be absolutely great: I’m wary of venturing too deeply into unknown waters. How would I go about getting a copy of my document to you? Preferably not making it viewable to everybody else?

You may send it to mikekaganski@hotmail.com

Hi Mike,
i think I got a similar problem when I try to open a document in LibreOffice. There is always the following error message: “SAXException: [word/document.xml line 2]: PCDATA invalid Char value 65534”. After that error message there is another one coming up: SAXParseException: ‘[word/document.xml line 2]: PCDATA invalid Char value 65534’, Stream ‘word/document.xml’, Line 2, Column 49932(Zeile,Spalte).
I tried to follow the instructions in this forum in that case and in the end I deleted the xml tags in the document.xml data. But I dont know how I should connect the edited document.xml data to the original data in the end. Honestly I am not very a technical expert, so I dont know if everything of my job was right. I hope you can help me out in that case and it will be possible to open my doucment again without any data los. That would be so great.
Regards Paul

@PaulK1 I have never seen this specific problem (invalid char value), and its cause and fix need to be investigated (but a problematic file is needed for that).

For the specific task of re-inserting modified XML into an ODF: personally I use 7-Zip file manager on Windows; open an ODF in it (right-clicking the file, and choosing 7-Zip|Open Archive); open the XML in question (right-clicking in in 7-Zip window and choosing Edit); do required edits to the file, and then save and close the editor application (I use Notepad++), and then 7-Zip detects that the file was modified, and asks myself if I want to update the file in the archive. I accept; that’s all.

To have the question answered:

The problematic file was emailed to me, and fixed by editing the XML source (word/document.xml), using regex (\<[^>]+)([\w]+:[\w]+="[^"]+")([^>]+)\2 replaced by $1$2$3.

The problematic file was generated by LibreOffice 5.4.3.2 (WinX86_64). Resaving the fixed file using 6.2.4.2 (x64) does not reproduce generation of invalid XML, so likely fixed in the meanwhile. An advice to always store files in native ODF format, and only export to DOCX when sending to people unable to read ODF, was given.