Extracting a string that fits a specified format of text and digits from a longer string

My understanding of formula in Calc is basic and nil with regex so I am having problems trying to resolve this problem.

I have a LO Calc (CSV file) sheet with cells which contain a British National Grid reference. For example:
[A1] Buster Camp SP 591199, area, notes
[A2] Buster Camp 2 ST 255035
[A3] Highcloud TQ 435605

The reference is not always in the same place and not always preceded by or followed by a particular punctuation mark.The multiple spaces are a feature of the dataset but are not a consistent number of spaces across cells. Sometimes, as in the first example above, the data is followed by another word or words.

The data I need extracted is the two-letter, six-digit string SP 591199 - the two-letter, six-digit format is consistent but may also appear without the space as SP591199.

The variety of in how each cell is written means that Text to Columns will not work (or not in any setting that I can find). Can this be achieved with FIND, SEARCH or another function?

Welcome!
Do you mean something like =REGEX(A1;"[A-Z]{2} *\d{6}")?

1 Like

If case-insensitive:
=REGEX(A1;"[A-Za-z]{2} *\d{6}")
Or
=REGEX(A1;"(?i)[A-Z]{2} *\d{6}")

1 Like

Thanks @JohnSUN and @eeigor , your solutions work a treat - obviously I need to read up on REGEX.

To be OCD about the query, is it possible to format the result with so that the letters-no space-numbers OR the letters-space-numbers returns one version? It could be with the space or without, it does not matter.

Am I correct in thinking that Excel (Office 2013 or 365 online) does not support regex?

@NNorse , please mark the helpful reply as the Solution. Thanks.

1 Like

=SUBSTITUTE(REGEX(A1;"[A-Z]{2} *\d{6}");" ";"")?

Or (thanks @erAck !)
=REGEX(REGEX(A1;"[A-Z]{2} *\d{6}");"([A-Z]{2}) *(\d{6})";"$1 $2") ?

1 Like
=REGEX(A1;"([A-Z]{2}) *(\d{6})";"$1$2")
1 Like

Yes, it doesn’t.

Yes, but it’s easy to implement with an external library VBScript.RegExp. Supports basic operations. No looking ahead or behind.

Public Function RegexExtract(Text As String, Pattern As String, Optional Item As Integer = 1) As String
    On Error GoTo Failed
    Dim re, matches
    Set re = CreateObject("VBScript.RegExp")
    re.Pattern = Pattern
    re.Global = True
    If re.Test(Text) Then
        Set matches = re.Execute(Text)
        RegexExtract = matches.Item(Item - 1)  '0-based
        Exit Function
    End If

Failed:
    RegexExtract = CVErr(xlErrValue)
End Function

Thanks everyone for the help - problem solved!

If the goal is to extract only the “ID” (with one space in the result) then rather

=REGEX(A1;".*([A-Z]{2}) *(\d{6}).*";"$1 $2")