How to compare/replicate document styles between documents in different languages?

Hi,
I want to compare two document pairs: one is the Original in German and the other the Translation in Spanish.

The German original will have:

  • the correct paragraph styles (H1, H2, H3, etc., quotations, default paragraph style, etc.)
  • the correct character styles, superscript, small caps, emphasis/italics (italics/emphasis being the most important)
  • blank lines among paragraphs when they appear in the original

I am looking for a way to

  • compare the styles/structure (if the blank lines have been put at the right place) of the two documents
    and either
  • show where the differences are
    and/or
  • help to replicate the style of the original in the translation

Do you know if a way to do that in LO exists?
Thanks a lot in advance,
Daneb

Thank you, @ajlittoz for this very detailed answer.

Styling

Thank you for mentioning to use same Template for both documents. I had nearly forgotten about that.
I will explain things more in detail, maybe you can improve the way we are working.

Template on a shared Google Drive

We are sharing a template on a subfolder to the folder where translations are to be done on a shared Google Drive. What I do at the moment is creating a document and then assigning it that template with the Change Template extension. Question: Is there a way to tell LO to pick that template as my default template?
At the moment, that template has no language, since the translators will be localizing in their own language. Changing the Default Paragraph Style, Font tab to set Language to Spanish in a Spanish document did the trick. This is a lot quicker than what I was doing before: Preferences, Language settings, Change default language, apply. Thanks!

Blank lines

I agree with what you say about blank lines. Both authors we are translating use them with a purpose between certain paragraphs, moreover we need them for further file conversions to be offered on our website.

Structure and Styles

We’ve been doing some testing and seen that what allows either

  • a structure for a new translation
    or
  • a way to compare existing translations (as far as the structure goes, not really for checking completeness, for that see last chapter)

and doesn’t eliminate style, is:

  • Setting the view to Web.

then,

For new translations,

  1. Convert structured and styled original to a table, Separate Text at paragraphs
  2. Add a column to the right.

The translator will only have to reproduce the styles and structure he sees on the left while he translates.

For revisions

  • Do point 1 and 2 for the original
  • Convert the structured and styled translation separately to a table
  • Paste it to the column on the right.

Start revision of structure with the Search Paragraph style function

Drawbacks

Language Grammar and Spelling (we are using the LT extension)

  1. Changing the Default Paragraph Style language works only for one of them (the target language, i.e. Spanish, while the source text will be underlined in red. Since we are focusing on the target, this is easily overcome, setting the source text to German manually (Selecting the source text column, Ctrl + A and selecting then the language from Format - Character). Question: would there be a cleaner way to do that for the source text?
  2. Also, with long documents in two columns, LT becomes a little crazy and the grammar does not work well, strange underlining, and no underlining at all at the bottom of the document. I had two crashes, probably due to this. I am using LT 5.3, because the latest update was worse (more error messages), but I shall do some more testing to see if they have added any patches.

This has no solution at the moment.

Footnotes

  1. Footnotes are numbered starting from the left column (i.e. the source, German: footnote1, 2, 3, 4, 5, 6, etc.) and then the numbering continues with those on the right (i.e. the target, Spanish: footnote7, 8, 9, 10, 11, 12, etc.), which means that the numbering is not parallel in the source and target. But this is not a problem. The translation can just copy the note and then translate it, and the reviser can click on both to check them, unless there is a better way.

Comparison

What you suggested is very interesting for focusing on matching blank lines between original and target. I’ll definitely give it a try. Thank you!

Segment comparison (Alignment) resources

For segments comparison (that is at the sentence level), there are more tools out there which you are surely familiar with.
I’ll list here those which we have been testing: Matecat aligner is quick but not always precise, webAlignToolkit (choosing YASA engine seems best) allows you to paste, and is pretty precise. For a more professional output in XLSX signalling missing segments Freetm - Autoaligner, is very precise (but you’ll have to convert your texts to txt or doc first). There is also LFaligner which looks outstanding as far as alignment goes.

Questions on your last suggestions

This is an excellent suggestion. Thanks!

How do I do that? Thank you in advance!

Select text and hit Ctrl + M to remove ad hoc formatting.

Yes. File>Manage>Manage Templates. “Import” it so that LO knows about it and make it your default template.

Using a Writer table is basically the same idea as working in Calc, the difference being you don’t lose styles. But, you end up with a bi-lingual document. Correctly managing it is tricky because, as you saw, a style can be associated only with a single language.

You can do it duplicating applied styles, one set for each language. In principle, duplicating paragraph styles should be enough because character styles generally only bring some semantic nuance, not changing the current language (except if you want to make quotations in other language).
How you duplicate is a matter of personal taste: either you start from a separate “default paragraph style” and you build a full style tree from there, making it easy to change the language to the cost of having to take manually care of keeping the styles in sync vs. indents, spacing, alignment, … or you duplicate the styles individually changing only the language, winning the guarantee that both versions will always be consistent regarding indents, spacing, alignment, … but having to track all styles should you change language.
I would recommend the latter solution if the document is really bi-lingual, i.e. both languages side-by-side in the final copy.

I have always been very cautious about translation/grammar tools because I feel they don’t address the problem correctly. Parsing a natural language is very difficult because you can’t do it purely morphologically like with computer languages. There is permanent interaction between semantics, grammar structure and syntax. This means you need a dictionary with semantic information so that this information can change parsing path (natural languages are highly context dependent while nearly all common tools are context-free).
I had a try with LT but I was very disappointed with the result for two reasons. First, I worked with a document exhibiting a highly elaborate use of grammar constructs (a literary “classical” work) and LT declared nearly every sentence as faulty. Clearly, its grammar design is rather oriented towards simplified (if not simplistic) everyday documents like letters, orders, invoices, contracts. Second, when translating, the resulting document was polluted with direct formatting in excess, making difficult to “polish” the final copy.
In the end, I dropped LT.

There is a single internal footnote counter. This means you can’t define a second note counter like you can do for headings, figures, tables, …
I found an ugly workaround valid for this kind of work (parallel texts with identical note references) but it is a bit long for a comment. Please, ask a separate question and I’ll answer there. For community benefit, be very descriptive in the question text.

Style comparison with colour change

The problem here is every change in a style creates an *override* which disables irremediably inheritance from ancestor style. Consequently, if the font colour in this style is the same as in the ancestor, you simply press Reset to Parent (or Standard in older versions). If the font colour was explicitly set in this style, restore the previous colour.

Styling

I suggest you base both documents on the same template and use the same style names in both. If structure is to be preserved in the translation, you should have a one-to-one correspondence between versions and it makes sense to use the same styles as paragraphs keep the same semantics. (I remind you that styles do not mark typographical appearance but convey the semantic significance of the paragraphs, sequences for character styles.)

Of course, your template will have a language attached to it. Keep the default language of your OS. This is most convenient when this template is your favourite one and you use it for general purpose.

In the version documents, customise Default Paragraph Style, Font tab to set Language to German or Spanish. Since Default Paragraph Style is the ultimate ancestor of all styles, this setting will automatically be forwarded to all others. Unless you forced the language in some style, cutting the inheritance.

Blank lines

Usually blank lines are erroneously used to vertically space paragraphs. This vertical space should be defined in paragraph styles.

Empty paragraphs have no contents. Therefore they have a void semantic significance and should be suppressed. The only use I see for empty paragraphs in your translation context is when grammar rules required to split/merge a paragraph due to diverging rules in the target language or translating constraint imposed to add comment, note, … between two original paragraphs. In this case, empty paragraphs in source/target have a semantic significance, denoting addition/deletion in the other version.

Comparison

I am helping a translator/editor/author on such a task. The easiest procedure he found is as follows:

  • export both versions as plain text (.txt)
  • import both plain text files in Calc, each version in its own column, one paragraph per cell
    This can be done with Sheet>Insert Sheet from File in separate sheets. Paragraphs will be stored in separate cells in a column. Then copy the column in the second sheet to paste it into the second column of the first sheet.

You now have a side-by-side document and any shift becomes obvious.

I know of no trick to compare styling. In principle, if you worked with styles from the start, your document should be pretty clean. To make sure, you can temporarily alter the corresponding styles in both Writer documents, changing the font color, to make sure the same sequences have been styled the same.

CAUTION! Be sure to remove your change in such a way it is no longer considered a document-local override so that a change in the template is still automatically forwarded to the documents.

1 Like

Thank you @ajlittoz I have encountered two difficulties, one on converting text to table and another regards the template importing. I will write two different issues if I do not find answers on asklibreoffice.