I need to update a bilingual dictionary written in Writer by first parsing all entries into their parts e.g.
- main word (font 1, bold)
- foreign equivalent transliterated (font 1, italic)
- foreign equivalent (font 2, bold)
- part of speech (font 1, italic)
Each line of the document is the main word followed by the parts listed above, each separated by a space or punctuation.
I need to automate the process of walking through the whole file, line by line, and place a delimiter between each part, ignoring spaces and punctuation, so I can mass import it into a Calc file. In other words, “each part” is a sequence of character (ignoring spaces and punctuation) that have the same font AND font-style.
I have tried the standard Search&Replace feature, and AltSearch extension, but neither are able to complete the task. The main problem is I am not able to write a search query that says:
Find: consecutive characters with the same font AND font_style, ignore spaces and punctuation
Replace: term found above + “delimiter”
Any suggestions how I can write a script for this, or if an existing tool can solve the problem?
Thanks!
Pseudo code for desired effect:
var delimiter = "|"
Go to beginning of document
While not end of document do:
var $currLine = get line from doc
var $currChar = get next character which is not space or punctuation;
var $font = currChar.font
var $font_style - currChar.font_style (e.g. bold, italic, normal)
While not end of line do:
$currChar = next character which is not space or punctuation;
if (currChar.font != $font || currChar.font_style != $font_style) { // font or style has changed
print $delimiter
$font = currChar.font
$font_style - currChar.font_style (e.g. bold, italic, normal)
}
end While
end While