Ask Your Question

Revision history [back]

The short answer to your question is “yes, this is very much possible and is one of the main advantages of XML: you can combine any XML data with any other XML data, given the right transformations”.

That is the harder part: getting your XML data in a useful format. Let us make a helpful distinction. XML documents can have either a narrative focus or a records-based focus:

  • What I mean by narrative is a document that has one main run of text organized more or less linearly. XHTML, OpenDocument Text, TEI, DocBook and RSS all have a narrative focus, even though they all have different specific purposes.

  • What I mean by records-based focus is a document that is not primarily linear and is intended for record-keeping, comparison, calculation, analysis or reference. OpenDocument Spreadsheet, OpenDocument Database and XML configuration files in general have a records-based focus.

That being said, any well-formed XML document can (theoretically) be imported in LibreOffice and recognized as XML, whether the document has a narrative or records-based focus. However, by default, any imported XML document which is not specifically accounted for by an import filter will be opened in LibreOffice Calc as though it were records-based. One interesting case is opening the content.xml or the styles.xml from an OpenDocument Text document: even though they are part of a format that is accounted for, individually they present themselves as records-based documents when imported.

When importing such a document in Calc, the application will offer to “map” the elements and attributes of the document to cells and properties in a spreadsheet table. Depending on the document type, this might be what you want or not. If it is what you want, you simply need to take the time to think about how you want to organize the document’s information in a table or set of tables.

If your document has a narrative focus and you want to use its content in an OpenDocument Text document, you have multiple options that all involve some work on your part.

  • If, for the given document type, there already exists a tool to convert it to a format that is already accounted for in LibreOffice, then by all means use this tool. However, the tool might not have the desired granularity i.e. it might not offer you enough options in deciding how to treat the elements, attributes and character data of the source document.

  • If no tool yet exists and the document is relatively simple, you can modify the document semi-manually in order to extract what you need from it. What I mean by semi-manually is that you should try to automate as much of the process as possible by e.g. using complex search-and-replace operations with regular expressions. Always test the regular expression operations before actually applying them to your document. Also, always operate on a copy of the document and keep a reference copy untouched and safe.

  • If no tool yet exists and the document is complex, you will have to do some programming. You have many options here: Java, Python or any good general-purpose programming language with appropriate XML libraries or, alternatively, XSLT, which is a specialized programming language meant specifically for XML document transformation. There are some tutorials on the Web and many books about XSLT. Depending on who you ask, it is considered either very complex and abstract or very simple and straightforward. Opinions notwithstanding, it is powerful, has no equivalent and can be directly processed by LibreOffice (i.e. LibreOffice contains an XSLT processor). XSLT “filters” can be added to LibreOffice and even made available in the document export dialog. This means that no programming or scripting is required to use them once they have been developed.

Welcome to the world of XML.