Old macintosh word(?) file

JoriMäntysalo · May 13, 2016, 11:34am

I was asked to open some old files, mostly made with Macintosh around 1990. As an example I have a file that has the string “WDBNMSWD” starting from location 0x20. It is only 2175 bytes long - can it be a real MS Word file?

If so, how can I tell LibreOffice to open it? By Googling I found that libmwaw should open it, and that LO should have it integrated. Is it so? LO do open it, but the content is only some garbage.

ajlittoz · May 14, 2016, 7:36am

Old Macintosh files are kind of small directory containing 2 sub-files: one called “data fork” is equivalent to a traditional file, the other one called “resource fork” contains a small poor man’s DB with lots of meta data.

When you open a Mac file under another OS, this OS sees only the data fork. Unfortunately, your Word file is stored in a binary format in the “data fork” and cannot be decoded without the help of the “resource fork” metadata.

To be able to read your file, you must first transform it into a “data fork”-only format. You can do it only if you still have access to a vintage Mac and the adequate Word version. In this case, you can try text format (you lose all styling) or Rich Text Format RTF (but styling will anyway be distorted).

Then, the file can be processed by LO, BUT expect some more difficulty due to the MacRoman character set which is different from Unicode used by LO. MacRoman is a single-byte character set covering the full 0-255 range. It then conflicts with the 0xA0-0xFF UTF-8 range used for multi-byte sequences: this results in funny glyph display. However, these artefacts are deterministic and you can map them unambiguously to the original character.

Note:

The “string” WDBNMSWD is a format/creator signature telling the file was created by M$ Word (MSWD) and stored in Word binary format (WDBN). This information is used by MacOS to launch the required application in response to a double-click on the file icon, or by Finder (the file manager) to display an ad-hoc icon.

I can eventually do the conversion, provided you manage to transfer the original file unaltered (i.e. with its data and resource forks), but this probably means you have access to a vintage Mac and can do the conversion yourself. In case you can’t convert, use CompactPro or Stuffit to create a compressed data fork-only version of your file before transmitting it through the Internet.

If you want to do it all by yourself, the tricky part is exporting the converted file to “modern” OSes. Vintage Macs can only use diskettes (I’ve tried Ethernet LAN but FTP servers on Mac errouneously attempt a MacRoman to ISO-8859-1 translation which causes havoc at the receiving computer). Create a tar diskette for the converted file. Next, use an old diskette-capable computer to upgrade to modern technology: I use a Linux IBM ThinkPad T20 to recover the file and have it exposed on my LAN.

I was able to recover 20+ years old Word documents in a state where they could be read though quite “distorted”. I have not yet attempted to recover Excel files but the procedure should be similar.

JoriMäntysalo · May 16, 2016, 10:53am

About transferring: I had a Mac with external floppy drive and external SCSI disk, and then older PC with SCSI card. It was quite easy phase.

ajlittoz · May 17, 2016, 6:22am

Unfortunately, I have no SCSI card on my PCs, hence my trials with LAN to end up with physical diskette transfer.

JoriMäntysalo · May 16, 2016, 10:47am

I do remember the “forks” thing in Mac OS. It is impossible to even get plain text out of the file without resource fork? Unfortunately I think that original floppies are now gone. :=\

The data was located on 5.25" floppies (both ~800KB and 1.2MB), HD 3.5" floppies and in 800KB 3.5" floppies. For other types I was able to read the raw image with PC and Linux, but 800KB floppies worked only in an old mac with external 800KB floppy disk. MacRoman charset is not a problem, as recode (or even tr) can convert it.

Btw, there is another example file. It is about 67 KB long, contains human-readable strings like “Human-Computer Interaction. Erlbaum, 1983”, but is not plain text, and starts with

00000000  0b ad de ed 00 00 00 02  00 00 00 18 00 92 00 04  |................|
00000010  00 01 05 58 00 10 00 91  00 00 00 54 00 00 04 a8  |...X.......T....|
00000020  00 00 04 30 00 00 04 fc  00 00 00 1c 00 00 09 2c  |...0...........,|
00000030  00 00 00 00 00 00 09 48  00 00 00 3a 00 00 09 48  |.......H...:...H|
00000040  00 00 00 38 00 00 09 82  00 00 02 72 00 00 09 ba  |...8.......r....|
00000050  00 00 00 1c 00 00 0c 2c  00 00 00 00 00 00 0c 48  |.......,.......H|

file command says just “data”. Any ideas about this? May not be Macintosh file.

ajlittoz · May 17, 2016, 5:59am

About MacRoman: you’re right, once the file has been imported unaltered in a modern box, there are no more problems.

ajlittoz · May 17, 2016, 6:12am

About binary dump: this dump does not ring bells in my head. The most interesting piece of information is the type/creator data, like WDBN/MSWD mentioned in OP. But, this means you can still put your hands on a Mac.

The extract is a bit short, but since there are many zeroes, I’d bet it is some kind of preamble with indexes to subsequent parts. 00010558 at 10 looks like the file length. Usually, first file sector is special and real contents start at sector 1.

ajlittoz · May 17, 2016, 6:19am

About plain text: if type/creator signature is TEXT/xxxx (xxxx=don’t care), the file is really plain text and optional resource fork only contains “comfort” hints (like tab width, font name, …) and file can be recovered without need for it.

With other signatures, the resource fork is required to decode the data fork. For example, original MacWrite resource fork provided a string of letters in frequency order for Huffman compression (esanti…).

osnola · May 17, 2016, 11:24am

Hello,
I have a lot of Mac Word files v1.0 to v5.0 and there all begin with 0xfe then a number between 0x32 and 0x37 :-~

Concerning the first file, if its header contains WDBNMSWD, it is probably a Mac Word file which was compressed by some utility, maybe by MacBinary/CompactPro/Stuffit or by some other utility…

Concerning the second file, I have never seem any file which have such header, so I have no idea.

Note:
If these files are not confidential, you can send them to me ( you can find my email in http://www.loria.fr/~alonso/ ), I can guess more…

ajlittoz · May 18, 2016, 6:13am

Signature (type/creator) WDBN/MSWD tells it is a native M$ Word file. If it has been compressed by any utility, its signature would have been changed. Which precise file format is encoded somewhere in the file preamble (sector 0) and is checked by the application (though I doubt Word 1.0 would correctly reject a file created by Word 6.0).

osnola · May 18, 2016, 7:03am

Yes, but normally, this information is stored in the disk’s catalog not in the file’s data fork. So if it is present in the file data fork, this probably means that some utility/compressor has merged/saved the file information, the data fork (and probably the rsrc fork) in a new data fork, …

Note: the second file’s signature “bad deed” is quite original, but I did not find it in the web…

JoriMäntysalo · May 20, 2016, 10:35am

Got it! Linux command unar was able to uncompress (also) files containing WDBNMSWD and then LibreOffice 5 opens it. Thanks!

AlexKemp · September 5, 2020, 12:48am