Since “Roman numerals” are written with ordinary Latin letters, there is no simple solution, and, so I’m afraid, no definitely reliable one.
I don’t know any clear and unambiguous syntax for “Roman numerals”, in specific not for the “subtractive notation”. It’s a mess and should finally be trashed after 2000 years.
- Sorry your attempt is (imo) too unspecific, and (mainly) does not look for the point.
- Depending on what kinds of “words” may occur in your text, and if you definitely only look for occurrences at the beginning of paragraphs, you may try
(?-i)[IVXLCDM]+\. (?=\d)
where (?-i) switches off the default case insensitivity, and the final parenthese gives a lookahead assertion (not becoming part of the finding). Your attempt to use ^
for “start of line” must fail. It means “start of paragraph”.
Going into more detail: Try
(?-i)^(I|II|III|IV|V|IX|X|XX|XXX|XL|L|XC|CD|D|CM|M+)\. (?d)
with or without the ^
and with or without the (?-i)
depending on your situation.
Sorry. Jumped too short. You would need something more complicatzed allowing for specific concatenations regarding the order. It might start like
(?-i)^M+?((CM)|(D(CD)?))?
and continue with …
Regular expressioon were originally created by mathematicians to describe “regular syntaxes” which had to be “context-free”. …