Font style declaration in source code

I am trying to write a kruti-dev to Unicode convertor and have 2 problems.

  1. style declaration is not consistent:
    In order to select the kruti-dev font I checked the style tag but found that it has been written in different ways for different text. For e.g.

<style:text-properties fo:font-family="‘Kruti Dev 055’" style:font-pitch=“variable”/>

<style:font-face style:name=“Kruti Dev 055” svg:font-family="‘Kruti Dev 055’" style:font-pitch=“variable”/>

<style:font-name=“Kruti Dev 055” officeooo:rsid=“00122578” officeooo:paragraph-rsid=“00122578”/>

  1. The user is using “kruti-dev” family fonts. It means he may have kruti dev 055 or kruti dev 050 or 011

I am using python script to change content.xml file. Any help will be appreciated. I need to change the text typed in that font only. Do not modify english text.

The officeooo:xxx-rsid is auxiliary data inserted to help track changes. You can ignore it. Unfortunately, it is inserted the same way as direct formatting, which makes the XML more complex to parse. You can disable the feature but this does not remove existing tagging. To get rid of the “rsid” tagging, try File>Save As and check.

How is your document formatted? It will be easier to parse if it is fully styled (no direct formatting). However, I am not sure that bare Python is sufficient to do this (Python provides regular expressions which are not powerful or versatile enough to match up with a more abstract grammar). See if you can find LL or LR parsers in libraries. Otherwise use tools like lex and yacc. The grammar to write should be rather simple.

python has builtin modules to read|write xml-data … based on this there is also the more convinient thirdparty-package lxml

However, I seriously doubt whether @shantanuo has sufficient knowledge of Python, and even more so, an understanding of the complexity of XML.

I think that reading XML is not sufficient. There is added semantics in ODF and you may meet “improper” nesting (from an XML point of view) because paragraph and character styles live in different layers. This is why I suggested adding a more powerful parsing strategy than “simple” regexps.

my post was not exclusivly about reading xml!