Xls file format, data changes without adjust of modification time

hello @all,

while checking old backups i came across an oddity. some *.xls files have slightly changed binary content while the timestamp is unchanged (mod time identical, last access and created different of course).

from typical similar small changes in *.xls files, see details below, and only in them, i think it’s not a result of hard- or software malfunction, but intentional from a program which had accessed the files.

the files have been created with excel (2007?) under windows, copied for backed purpose once and again at different times, may be accessed in between, may be with excel, may be with various versions of LO calc, and moved around on different usb-sticks (mostly formatted ‘fat32’ … the history is not documented.

i’d like to know when and why such alterations may occur, and what they result from, as well to improve my knowledge, as to avoid it for the future.

(i like quick backups with hardlinked files done by rsync, i’d like to know when space is wasted reg. changes in files, and i’d like to have backups which say ‘binary identical’ on later diff’s, preferred to ‘binary files differ’.)

the changes in detail: each file contains one block starting with ‘Root Entry’ written in a ‘two byte encoded’ format as hex: ‘52 00 6F 00 6F 00 74 00 …’ with 8 lines of 16 bytes, mostly ‘00’. after that block it continues with ‘Workbook’ in the same notation, hex: ‘57 00 6F 00 72 00 6B 00 …’. byte 109 to 116 (decimal) of this block (counted from the ‘R’ of ‘Root Entry’ as 1) (byte hex 0x6c to 0x73 started with 0x00) are subject to the changes. while often ‘00 00 00 00 00 00 00 00’ they are changed to some non alpanumeric value in some of the files. !while the modification time stays unchanged!.

see screenshot below.

anybody any idea? perhaps metadata like hidden access time entry? selected sheet and cursor position? NSA stamp ‘already copied’?

pls. avoid answers propagating better backup procedures - i can’t change the past, against usb sticks - the same, and assumptions it could be accidental errors - they are far too typical and ‘solo’ for that.

tia,

reg.

b.

The fragment shown in your screenshot suggests it is a section within a compound file directory entry, which basically is a storage or stream system within a file. If you really want to dig into that there’s the [MS-CFB]: Compound File Binary File Format documentation. Take a look at section 2.6.1 Compound File Directory Entry. From a short glance (I’m too lazy now to count bytes in a screenshot) it could be the Modified Time entry. Might be some Excel version (or some other Windows tool) updates that to 00-Bytes (or vice versa) when opening the file to access its streams’ content. The file system’s modified time does not necessarily need to correspond to that.

That btw isn’t the only place that can get changed when opening an .xls file with MS-Excel without modifying it, there’s also a username record that (for sharing purposes) indicates the current user having opened the file.

as far as i can see that is it, i counted the bytes. thanks for the hint.

imho it’s a fail of whatever program does things like this. it brings a questionable small benefit, but has an immense potential to irritate users. you run into a trap and waste time. nobody wants deviations in backups, thus i had to invest hours in investigation once it was noticed.

from the documentation: ‘For a root storage object, this field MAY<2> be set to all zeroes, and the modified time is retrieved or set on the compound file itself.’

it’s no sure whether the date was first set and then erased or vice versa, with a philosophy to leave internal structures untouched until you change data neither would have happened.