Regex to find any space beetween a roman numerals followed bt fot and digits

Hi.

I’m editing a my text where I need to find any space beetween roman numerals followed by dot, and digits, like this:

I. 6
II. 7
III. 10

and so on…

I tried:

^([IVXLCDM])\s(\d)

but it is not working. Do you have any suggestion? Thanks in advance

Since “Roman numerals” are written with ordinary Latin letters, there is no simple solution, and, so I’m afraid, no definitely reliable one.

I don’t know any clear and unambiguous syntax for “Roman numerals”, in specific not for the “subtractive notation”. It’s a mess and should finally be trashed after 2000 years.

  • Sorry your attempt is (imo) too unspecific, and (mainly) does not look for the point.
  • Depending on what kinds of “words” may occur in your text, and if you definitely only look for occurrences at the beginning of paragraphs, you may try
    (?-i)[IVXLCDM]+\. (?=\d)
    where (?-i) switches off the default case insensitivity, and the final parenthese gives a lookahead assertion (not becoming part of the finding). Your attempt to use ^ for “start of line” must fail. It means “start of paragraph”.

Going into more detail: Try
(?-i)^(I|II|III|IV|V|IX|X|XX|XXX|XL|L|XC|CD|D|CM|M+)\. (?d)
with or without the ^ and with or without the (?-i) depending on your situation.

Sorry. Jumped too short. You would need something more complicatzed allowing for specific concatenations regarding the order. It might start like
(?-i)^M+?((CM)|(D(CD)?))? and continue with …
Regular expressioon were originally created by mathematicians to describe “regular syntaxes” which had to be “context-free”. …

1 Like

both with or without

(?-i)

it finds yes the space, but also matches the numeral before

I obviously didn’t read your question thoroughly enough.

If you want to keep the “Roman nonsense” all the advice is in vain. You won’t get something reasonable.

Anyway: See also the attached example where you find a two-steps-solution dscribed.
disask104966_VerySpecialRegExQuestion.odt (70.8 KB)

1 Like

Perhaps you would advantageously replace your hand-written “numbering” by an automatic one using some list style or heading numbering. With such automatic numbering, your result is always consistent.

In Writer, lists are always multi-level. Consequently your numbers above are Roman for level-1 and standard-Western at level-2 (I don’t write Arabic intentionally because this word is ambiguous: real Arabic digits are ٠١٢٣٤٥٦٧٨٩ while what we usually call “Arabic” digits in fact originate from India). Your mixed numbering is easily configured and you no longer need to care about it.

1 Like

Maybe these help:
regex - How do you match only valid roman numerals with a regular expression? - Stack Overflow
6.9. Roman Numerals - Regular Expressions Cookbook [Book]

1 Like