After searching and reading through many posts, I found that some of the explanations pointed in the right direction, but did not explain what the real problem was. Here’s the problem description.
It doesn’t matter which XML element is causing the problem. The real issue is that in the XML element, there is an attribute that is defined, but that same attribute is defined a second time within the element. That is what OpenOffice/LibreOffice complain about - the second definition of the attribute. The problem is usually in the style.xml file (after you’ve unzipped the DOCX file), but may be in the document.xml file too.
To fix this:
- Unzip the .docx file into a temporary directory
- Using an XML Editor open the problematic XML file - I used Netbeans since I have it on my computer;
- I specifically look for the offending attribute appearing twice in the same XML element - the error message provided you a hint: in this thread, the attribute is w:cstheme, but it could anything else (in my case it was wval)
- Remove the second attribute with that name from that line - you should then have only one attribute with that name in that XML element
- Save the XML file
- Zip up the contents of that temporary folder into a .docx file outside the temporary directory
- Now read the file again in OO/LO. if you see the error show up again with another attribute name, repeat this process for the offending attribute, ad infinitum until it works.
In the event your DOCX file is password-protected, you have another problem - while OO/LO will prompt you for the password, it will fail with the “File format error”. Trying to unzip the .docx file fails because the content is encrypted so you can’t solve it with the method above. To solve this:
- Attempt to open the password-protected DOCX file - it will prompt for the password - supply it
- When the “File format error” appears, do NOT click on OK
- Open up a Shell window - you ARE doing this on a Linux desktop, aren’t you???
- Change directory to the /tmp folder
- Execute the ls -ltr command
- The last entry will be a directory with a funky name like lu345434p0c.tmp
- Go into that directory and list the files
- You will find two file with similar looking names - one is a zero-length file (which you can ignore), but the other is the DOCX file that has your unencrypted content
- Copy that file to a temporary folder outside the directory where you are
- Create another temporary directory and unzip the contents of the DOCX file you just copied - you are now at Step #1 of the process described above
- Fix your offending XML attribute problem and - voila - you have your DOCX file in OO/LO again.
Moral of this story?
NEVER USE DOCX FILES!! Why would you do this when ODT just works??
Hope this helped.