Ask Your Question
1

cwk file headers [closed]

asked 2013-12-12 00:22:23 +0200

RussR gravatar image

updated 2015-08-30 00:24:29 +0200

Alex Kemp gravatar image

The latest version of Libreoffice (4.1) will open cwk text files (sometimes called word processing files) properly with a few restrictions. It will also open cwk spreadsheet files but the files appear as a string of meaningless characters. It may open the other cwk types Draw, Paint, and Database in the same way but I do not have any of these to try.

I note that the long text string displayed for cwk spreadsheet files always starts with the same 8 character string on my PC running LO 4.1 on Window 8.1. The string is displayed as '##á#BOBO'. These translate to Ascii 35, 35, 225, 35, 66, 79, 66, 79. On a Mac running LO 4.1 on OS 10.7.5 the string is the same except for character position 3 which appears in different applications in different ways. One was Ascii 240, different from Ascii 225 on my PC.

I was hoping the string would be the same on both platforms but that does not seem to be the case. I an interested in how the string appears for the other CWK types and if anyone knows what these string mean. I am trying to identify the various cwk types when opened by LO 4.1 even though they are meaningless.

Any help would be appreciated.

edit retag flag offensive reopen merge delete

Closed for the following reason question is not relevant or outdated by Alex Kemp
close date 2015-11-16 16:54:24.936872

2 Answers

Sort by » oldest newest most voted
1

answered 2013-12-15 11:58:57 +0200

osnola gravatar image

updated 2013-12-15 11:59:59 +0200

Hello, just for clarity, LibreOffice 4.1 accepts mainly to convert the word processing's files ; it also accepts to try to convert the drawing's and the painting's files but the results of the conversion are often not good ( there will be a little better in LibreOffice 4.2 but still not perfect ) ...

Now, if you want to retrieve the type of an AppleWorks/ClarisWorks file, you can also take a look at the function CWParser::checkHeader ( in https://sourceforge.net/p/libmwaw/libmwaw/ci/master/tree/src/lib/CWParser.cxx ), the type is stored at position 243 or 249 or ... 278 depending on the file's version ( which appears in the first bytes of the files ). But unfortunately this type will be display as '#' if you try to open a Database/Presentation/Spreadsheet files with LibreOffice...

osnola.

edit flag offensive delete link more

Comments

Thanks. I tried (and failed) to find the corresponding code in libmwaw. I did not expect the byte offset of the file type to be version-dependent.

oweng gravatar imageoweng ( 2013-12-16 07:16:46 +0200 )edit

Hello, CWParser is the class which manages the parsing of the ClarisWorks/AppleWorks files, reads some general zones and distributes the jobs to other CWXXX classes. To parse a file, first it calls checkHeader then createZones, ...

Concerning the header, CWParser::readDocHeader is the function which tries to parse the first bytes of a file ( bytes 8 and after), but there remains many things that I do not understand and(so) it may be hard to read.

osnola gravatar imageosnola ( 2013-12-16 08:55:29 +0200 )edit
1

answered 2013-12-13 07:03:50 +0200

oweng gravatar image

updated 2013-12-13 07:04:32 +0200

For the clarity of others, this discussion relates to the threads here and here on this forum. One other thing I would like to point out is that hexadecimal rather than decimal values should be used when quoting file contents as this is the standard and makes it much easier for others to interpret what is being indicated. It is also not possible to determine anything from content displayed erroneously in Writer (e.g., "##á#") as this is translated.

Unfortunately there is no easy answer to the question asked here, as ClarisWorks, which later became AppleWorks[1], underwent several changes of version (in unrelated series numbering) across the MacOS and later Windows platforms. During these changes the components offered expanded from word processing, spreadsheet, and database, to later include page layout, graphics/drawing/painting, and equation editing, all of which used the CWK file extension. A CWK file therefore can originate from different platforms, versions, and components. Generally speaking, most CWK files will be from the later v5.x or v6.x series of the product for MacOS.

The Java source used by Terrence Curran indicates this byte format for the header in CWK files:

  • 01-04: Version e.g., I have seen 05 02 7d 00, 05 02 91 00, 05 02 99 00 = ClarisWorks v5.x; 06 07 d0 00, 06 07 e1 00 = AppleWorks v6.x.[2]
  • 05-08: File Creator ID e.g., 42 4f 42 4f = "BOBO" the reason for which is explained in the Notes at the foot of the page here by one of the creators of ClarisWorks.
  • 09-12: Previous Version e.g., I have seen 04 07 97 00, 04 07 9e 00 = ClarisWorks v4.x; 05 02 91 00, 05 02 99 00, 05 07 ad 00 = ClarisWorks v5.x; 06 07 d0 00 = AppleWorks v6.x.
  • 13-20: unknown e.g., appears to always be 00 00 00 00 00 00 00 00.
  • 21-22: unknown e.g., appears to always be 00 01.
  • 23-24: unknown, possible marker e.g., I have seen 00 0f, 00 ba, 00 bb, 00 b7, 00 c0, 01 d0.
  • 25-26: unknown e.g., I have seen 0b f0, 10 8c, 31 6c, 42 c4, 80 ac, 86 a6, 88 32, 8f 04, 8f 12, dd 06.[3]
  • 27-30: unknown e.g., appears to always be 00 00 00 00.
  • 31-32: Page Height[4] e.g., 02 53 = 595pt ...
(more)
edit flag offensive delete link more

Comments

Many thanks for the answers. I will be more careful in the future to use hex instead of dec Thanks again

RussR gravatar imageRussR ( 2013-12-16 19:54:49 +0200 )edit

There are now compiled versions of libmwaw referred to above available from: http://sourceforge.net/projects/libmwaw They have worked for me on both text and spreadsheets. Complex texts with many different fonts were completely recovered. Will convert far more than just CWK text files. See: http://sourceforge.net/p/libmwaw/wiki...

RobBW gravatar imageRobBW ( 2015-11-24 10:58:10 +0200 )edit

Question Tools

1 follower

Stats

Asked: 2013-12-12 00:22:23 +0200

Seen: 1,734 times

Last updated: Dec 15 '13