Cwk file headers

The latest version of Libreoffice (4.1) will open cwk text files (sometimes called word processing files) properly with a few restrictions. It will also open cwk spreadsheet files but the files appear as a string of meaningless characters. It may open the other cwk types Draw, Paint, and Database in the same way but I do not have any of these to try.

I note that the long text string displayed for cwk spreadsheet files always starts with the same 8 character string on my PC running LO 4.1 on Window 8.1. The string is displayed as ‘##BOBO’. These translate to Ascii 35, 35, 225, 35, 66, 79, 66, 79. On a Mac running LO 4.1 on OS 10.7.5 the string is the same except for character position 3 which appears in different applications in different ways. One was Ascii 240, different from Ascii 225 on my PC.

I was hoping the string would be the same on both platforms but that does not seem to be the case. I an interested in how the string appears for the other CWK types and if anyone knows what these string mean. I am trying to identify the various cwk types when opened by LO 4.1 even though they are meaningless.

Any help would be appreciated.

For the clarity of others, this discussion relates to the threads here and here on this forum. One other thing I would like to point out is that hexadecimal rather than decimal values should be used when quoting file contents as this is the standard and makes it much easier for others to interpret what is being indicated. It is also not possible to determine anything from content displayed erroneously in Writer (e.g., “##”) as this is translated.

Unfortunately there is no easy answer to the question asked here, as ClarisWorks, which later became AppleWorks[1], underwent several changes of version (in unrelated series numbering) across the MacOS and later Windows platforms. During these changes the components offered expanded from word processing, spreadsheet, and database, to later include page layout, graphics/drawing/painting, and equation editing, all of which used the CWK file extension. A CWK file therefore can originate from different platforms, versions, and components. Generally speaking, most CWK files will be from the later v5.x or v6.x series of the product for MacOS.

The Java source used by Terrence Curran indicates this byte format for the header in CWK files:

  • 01-04: Version e.g., I have seen 05 02 7d 00, 05 02 91 00, 05 02 99 00 = ClarisWorks v5.x; 06 07 d0 00, 06 07 e1 00 = AppleWorks v6.x.[2]
  • 05-08: File Creator ID e.g., 42 4f 42 4f = “BOBO” the reason for which is explained in the Notes at the foot of the page here by one of the creators of ClarisWorks.
  • 09-12: Previous Version e.g., I have seen 04 07 97 00, 04 07 9e 00 = ClarisWorks v4.x; 05 02 91 00, 05 02 99 00, 05 07 ad 00 = ClarisWorks v5.x; 06 07 d0 00 = AppleWorks v6.x.
  • 13-20: unknown e.g., appears to always be 00 00 00 00 00 00 00 00.
  • 21-22: unknown e.g., appears to always be 00 01.
  • 23-24: unknown, possible marker e.g., I have seen 00 0f, 00 ba, 00 bb, 00 b7, 00 c0, 01 d0.
  • 25-26: unknown e.g., I have seen 0b f0, 10 8c, 31 6c, 42 c4, 80 ac, 86 a6, 88 32, 8f 04, 8f 12, dd 06.[3]
  • 27-30: unknown e.g., appears to always be 00 00 00 00.
  • 31-32: Page Height[4] e.g., 02 53 = 595pt; 02 64 = 612pt; 03 18 = 792pt; 03 4a = 842pt.
  • 33-34: Page Width e.g., as above for Page Height.
  • 35-46: Page Margins e.g., six two-byte values such as 00 12 = 18pt; 00 3a = 58pt; 00 48 = 72pt.
  • 47-??: unknown

[1] Wikipedia.

[2] The “##” value quoted (decimal 35,35,225,35) is a translated value, as previously indicated. There are several example CWK files here. The 1998 Roster may be a spreadsheet form as these four bytes display in LO Writer v4.1.3.2 as ##�# (decimal 35,35,65533,35) but these four bytes in the file are 05 02 99 00 (i.e., v5.x). The reason the thrid byte changes is because the value is higher than ASCII and so is possibly interpreted differently in Windows and MacOS.

[3] Seems highly variable.

[4] A4 = 595x842pt; US Letter = 612x792pt.

Many thanks for the answers. I will be more careful in the future to use hex instead of dec
Thanks again

There are now compiled versions of libmwaw referred to above available from: http://sourceforge.net/projects/libmwaw
They have worked for me on both text and spreadsheets. Complex texts with many different fonts were completely recovered. Will convert far more than just CWK text files. See: http://sourceforge.net/p/libmwaw/wiki/Home/

Hello,
just for clarity, LibreOffice 4.1 accepts mainly to convert the word processing’s files ; it also accepts to try to convert the drawing’s and the painting’s files but the results of the conversion are often not good ( there will be a little better in LibreOffice 4.2 but still not perfect ) …

Now, if you want to retrieve the type of an AppleWorks/ClarisWorks file, you can also take a look at the function CWParser::checkHeader ( in https://sourceforge.net/p/libmwaw/libmwaw/ci/master/tree/src/lib/CWParser.cxx ), the type is stored at position 243 or 249 or … 278 depending on the file’s version ( which appears in the first bytes of the files ). But unfortunately this type will be display as ‘#’ if you try to open a Database/Presentation/Spreadsheet files with LibreOffice…

osnola.

Thanks. I tried (and failed) to find the corresponding code in libmwaw. I did not expect the byte offset of the file type to be version-dependent.

Hello,
CWParser is the class which manages the parsing of the ClarisWorks/AppleWorks files, reads some general zones and distributes the jobs to other CWXXX classes. To parse a file, first it calls checkHeader then createZones, …

Concerning the header, CWParser::readDocHeader is the function which tries to parse the first bytes of a file ( bytes 8 and after), but there remains many things that I do not understand and(so) it may be hard to read.