I need to update a bilingual dictionary written in Writer by first parsing all entries into their parts e.g.
- main word (font 1, bold)
 - foreign equivalent transliterated (font 1, italic)
 - foreign equivalent (font 2, bold)
 - part of speech (font 1, italic)
 
Each line of the document is the main word followed by the parts listed above, each separated by a space or punctuation.
I need to automate the process of walking through the whole file, line by line, and place a delimiter between each part, ignoring spaces and punctuation, so I can mass import it into a Calc file. In other words, “each part” is a sequence of character (ignoring spaces and punctuation) that have the same font AND font-style.
I have tried the standard Search&Replace feature, and AltSearch extension, but neither are able to complete the task. The main problem is I am not able to write a search query that says:
Find: consecutive characters with the same font AND font_style, ignore spaces and punctuation
Replace: term found above + “delimiter”
Any suggestions how I can write a script for this, or if an existing tool can solve the problem?
Thanks!
Pseudo code for desired effect:
var delimiter = "|"
Go to beginning of document
While not end of document do:
     var $currLine = get line from doc
     var $currChar = get next character which is not space or punctuation;
     var $font = currChar.font
     var $font_style - currChar.font_style (e.g. bold, italic, normal)
     While not end of line do:
         $currChar = next character which is not space or punctuation;
          if (currChar.font != $font || currChar.font_style != $font_style) { // font or style has changed
               print $delimiter
     
               $font = currChar.font
               $font_style - currChar.font_style (e.g. bold, italic, normal)
          }
     end While
end While