Word characters are [\p{Alphabetic}\p{Mark}\p{Decimal_Number}\p{Connector_Punctuation}\u200c\u200d].
Can anyone translate this ^ for me?
What is in the classes \p{Mark}, and \p{Connector_Punctuation}?
Reason for asking:
I have this string:
The Old Gymnasium, Vicarage Playing Fields, Llanbadarn Road.
I want to return Llandbadarn Road only
I’m trying to tell the regex I want “a string starting just after comma-space and ending just before full stop/period, but ignore all commas except the last one”.
I thought I could do that by returning everything between ,_
and .
if the only letters found are 0-9A-Za-z (but would be nice to add '
and -
too)
"(?<=,\s).*(?=\.)"
returns Vicarage Playing Fields, Llanbadarn Road
"(?<=,\s)\w*(?=\.)"
returns #NA
"(?<=,\s)[0-9A-Za-z]*(?=\.)"
returns #NA
Any tips? Thanks.
*(underscore is a space: markdown is unhappy without it)