Help! Enable LibreOffice to import/export a proprietary file format

I have a proprietary file format (.udf) that I want my clients to open, edit and save in LibreOffice Writer simply by double-clicking on it from the desktop.
A file in this format (.udf) is a “zip” archive that contains a “content.xml” file that has all the document details.

1-How can I achieve this?
2-What are the methods to make filters that LibreOffice supports currently?
3-If possible, I want to make the filter using python.

Please tell me if you need any more info. Thanks for your help.

Edit: I have uploaded the file, this website doesn’t allow my file extension to be uploaded, so I changed it to (.doc) so that I can upload it here, please change it back to (.udf) after download.
sample 1.doc (5.8 KB)

What is that .udf FILENAME extension?

The UDF is a disc imaging format:

The LO uses the ODF file formats.

1 Like

Do you have a link to the documentation for the used file-format? Anybody can invent and reuse extensions like .udf for anything. Can you provide a small sample file?

If you are very lucky it is only a renamed open-document-format. (Google shows a udf2pdf-converter on a turkish website). Otherwise it depends on the actual format…

What else? Only content.xml? If your zip listing looks like the following, then you are dealing with Open Document Format (ODF) which is the native file format of this office suite. Correct file name extensions are odt for text documents, ods for spreadsheets, odp for presentations, odg for vector graphic, odb for databases.


mimetype
styles.xml
layout-cache
Thumbnails/thumbnail.png
Configurations2/popupmenu/
Configurations2/statusbar/
Configurations2/accelerator/
Configurations2/progressbar/
Configurations2/toolpanel/
Configurations2/menubar/
Configurations2/images/Bitmaps/
Configurations2/floater/
Configurations2/toolbar/
meta.xml
settings.xml
manifest.rdf
META-INF/manifest.xml
content.xml

  • No it is not a disc image, it is a document file opened by a proprietary document editor.
  • I edited the post and uploaded an example
  • content.xml is the only file inside.

I had a look at the sample file. I doubt it was composed by a document processor because there is no meta data besides content.xml. This content.xml seems to me closer to HTML than to some office format. You should be able to display it in a web browser after making it more HTML-like. Notably, you should add a CSS stylesheet so that tweaking the CSS styles could render approximately the document as intended. Once done, you can export this HTML to something more usable.

If you want LibreOffice to support any file format, you need to create an import+export filter for this format. You are welcome to join development.

Can I create an import+export filter using python?

Yes you can.

How? Can you provide more details?

You had connected #libreoffice-dev at the morning, and unfortunately didn’t stay there long enough to get any replies (as its topic says, it’s most active in Europe office hours).

Anyways, Filter extensions - Apache OpenOffice Wiki might give some starting point.

It is an xml-file. In theory it should be possible to use an XSLT filter. But I have never done it. The UI for adding and testing such XSLT filter is in Tools > Macros > XML Filter Settings.
Your document has no real content but only structure information and it has nested tables. So perhaps a Writer template is a suitable import target.
But because of the CDATA part, I think the way mikekaganski pointed out is more suitable. Read the wiki page which he linked.

2 Likes

I prefer to use python rather than XSLT, if there is a way to make a python filter please inform me.

This is the specification of the target format: Open Document Format for Office Applications (OpenDocument) Version 1.3. Part 3: OpenDocument Schema
Now you need the specification of the source format or reverse engineer the source format from the files you have. Transfering one XML format into another is what XSLT has been invented for. You add your own XSLT to the office suite and open your udf files with LibreOffice.
It should be doable in Python, though.

Do you know how to do it in python? Like where do I put the script?

???
Your script has nothing to do with LibreOffice. Just write a program parsing the source file (content.xml from the udf) and writing to some content.xml. Then zip the content.xml output into a prepared template file. IMHO, this should give a valid office document. You will definitively fail if you are not familiar with this office suite, style families and features but your Python or XSLT program will not interface with any office suite.