Regex to find any space beetween a roman numerals followed bt fot and digits

michaeless · April 20, 2024, 4:31pm

Hi.

I’m editing a my text where I need to find any space beetween roman numerals followed by dot, and digits, like this:

I. 6
II. 7
III. 10

and so on…

I tried:

^([IVXLCDM])\s(\d)

but it is not working. Do you have any suggestion? Thanks in advance

Lupp · April 20, 2024, 4:58pm

Since “Roman numerals” are written with ordinary Latin letters, there is no simple solution, and, so I’m afraid, no definitely reliable one.

I don’t know any clear and unambiguous syntax for “Roman numerals”, in specific not for the “subtractive notation”. It’s a mess and should finally be trashed after 2000 years.

Sorry your attempt is (imo) too unspecific, and (mainly) does not look for the point.
Depending on what kinds of “words” may occur in your text, and if you definitely only look for occurrences at the beginning of paragraphs, you may try
(?-i)[IVXLCDM]+\. (?=\d)
where (?-i) switches off the default case insensitivity, and the final parenthese gives a lookahead assertion (not becoming part of the finding). Your attempt to use ^ for “start of line” must fail. It means “start of paragraph”.

Going into more detail: Try
(?-i)^(I|II|III|IV|V|IX|X|XX|XXX|XL|L|XC|CD|D|CM|M+)\. (?d)
with or without the ^ and with or without the (?-i) depending on your situation.
Sorry. Jumped too short. You would need something more complicatzed allowing for specific concatenations regarding the order. It might start like
(?-i)^M+?((CM)|(D(CD)?))? and continue with …
Regular expressioon were originally created by mathematicians to describe “regular syntaxes” which had to be “context-free”. …

michaeless · April 20, 2024, 5:27pm

both with or without

(?-i)

it finds yes the space, but also matches the numeral before

Lupp · April 20, 2024, 5:31pm

I obviously didn’t read your question thoroughly enough.

If you want to keep the “Roman nonsense” all the advice is in vain. You won’t get something reasonable.

Anyway: See also the attached example where you find a two-steps-solution dscribed.
disask104966_VerySpecialRegExQuestion.odt (70.8 KB)

ajlittoz · April 20, 2024, 6:19pm

Perhaps you would advantageously replace your hand-written “numbering” by an automatic one using some list style or heading numbering. With such automatic numbering, your result is always consistent.

In Writer, lists are always multi-level. Consequently your numbers above are Roman for level-1 and standard-Western at level-2 (I don’t write Arabic intentionally because this word is ambiguous: real Arabic digits are ٠١٢٣٤٥٦٧٨٩ while what we usually call “Arabic” digits in fact originate from India). Your mixed numbering is easily configured and you no longer need to care about it.

erAck · April 20, 2024, 7:13pm

Maybe these help:
regex - How do you match only valid roman numerals with a regular expression? - Stack Overflow
6.9. Roman Numerals - Regular Expressions Cookbook [Book]