How to select only groups of numbers of no more than n digits

Hi,
I am working with a text in which I am replacing abbreviations + numbers with [abbreviations unbreakable space numbers] [$1 $3]. But using this regex: ([:alpha:]+)([\x9]+|[\x20]+|[\t]+)([:digit:]+)
also places and dates (four digits) are selected.
So, as most numbers are only three digits, followed by [ , . - U2060 \s unbreakable spaces ) ] (there might be more than that) I thought that there might be a way to only search digits with only 3n and negating any following digits:
([:alpha:]+)([\x9]+|[\x20]+|[\t]+)([:digit:]{1,3}+)([^0-9]+?)
This is what I tried, but besides selecting also ) which I have no explanation for, the replacement string [$1 $3] now takes the ) away.
What would be the best way to do it?

Your regexp ([:alpha:]+)([\x9]+|[\x20]+|[\t]+)([:digit:]{1,3}+)([^0-9]+?) now contains four capture groups:

  • ([:alpha:]+) for the abbreviation
  • ([\x9]+|[\x20]+|[\t]+) for the spacers, which could be written as \s+ so that you make no hypothesis on spacing encoding
  • ([:digit:]{1,3}+) for 1-3 digits
  • ([^0-9]+?) for an optional string of non digits

The fourth group swallows the matching pattern. If you don’t reinsert it, it is lost.

Note that your regexp does not protect you against 4 digits. It would match “abc 1234” with “abc” in $1, one space in $2, “123” in $3 and $4 empty. You hit on a “date”.

A better regexp is ([:alpha:]+)\s+([:digit:]{1,3}+)([^0-9]|$) with 3 capture groups. Your replacement is now $1 $2$3 (the space is NBSP or any other to your liking.

Note I wrote [^0-9]|$ in the last group without the optional flag ? to impose the presence of a non-digit. But if your target is at end of a paragraph (without any other sign after it), [^0-9] does not match and the whole regexp fails. I added $ (paragraph end) as an alternative to cover this case.

To show the community your question has been answered, click the ✓ next to the correct answer, and “upvote” by clicking on the ^ arrow of any helpful answers. These are the mechanisms for communicating the quality of the Q&A on this site. Thanks!

In case you need clarification, edit your question (not an answer which is reserved for solutions) or comment the relevant answer.

Thank you @ajlittoz for the very detailed explanation! The regex worked very well!