Odt, ott files contain hidden/sensitive data which are not accessible from the editable documents/templates


I tried to convert a template file that I had to another one, by replacing some images and changing some custom properties / values. However I noticed that the produced ott (and odt) files had larger sizes than expected. When I unzipped them, I noticed that in the folder ‘Pictures’ they did contain images from the initial files which I had deleted via LibreOffice writer and they should not be there -as they are not accessible anymore via Writer. I also found in the styles.xml and manifest.xml there were references to old values which I had deleted/replaced with new ones. I just note that I had not enabled the tracking utility, and I can’t find a way to save the files without the old data inside them!!

I find this very disturbing, as those data may be sensitive data, and it seems that while I do delete them and replace them via Writer, in the produced ott and odt files those data are not deleted - and they are easily accessible for anyone knowing that you can just unzip those files and look their contents inside via simple image viewers and text editors.

So, my question is, if there is any way to clean the LibreOffice produced files from old records/data that are not visible/accessible from the Writer software anymore, not only in order to reduce size, but also to avoid sharing sensitive data.

Any solution?


Version: (x64) / LibreOffice Community
Build ID: 0f246aa12d0eee4a0f7adcefbf7c878fc2238db3
CPU threads: 4; OS: Windows 10.0 Build 19044; UI render: Skia/Raster; VCL: win
Locale: en-US (en_US); UI: en-US
Calc: threaded

Please provide such a file, so that the problem could be seen. E.g., maybe those images and other data relates to the unused, but still present styles?

Unfortunately I can’t share the file I have - due to it’s sensitive data - but based on your feedback I started removing elements of the document and checking in every step if the sensitive data was into the resulting saved file.

It was interesting that although I had create a single empty page document, the sensitive data were still into the saved file. Then I started deleting styles. Through experimentation I found out that those data were related with a Page Style. This page style was not in use in the original document anymore. Thus when I deleted finally the size reduced and the sensitive data disappeared.

Having fix the issue with your hint, I provide also some notes for anyone encounter similar issues. The sensible text data were at the file styles.xml which seems to imply that was recorded into a style. More specifically they were located in a xml section called: style:master-page with the name as the name of the style I deleted to remove them. Thus instead of trying to search which is the data/style to remove as I did, someone can use this information to directly identify where the sensitive data are.

Some ideas: This issue also may open the need for other tools for LibreOffice: A styles applied counter, thus to be able to identify styles that are not used in the present document directly from their counter, and/or one resource reviewer if it doesn’t exist to identify resources (like images and styles) that are embedded to the saved files but are not used in the current instance of the document.

Anyway. I think the problem is solved - the data where indeed in an unused Page Style, and removed when that page style was deleted.

Thanks for your hint!