How can i return all results from regex textsearch

Unable to find any useful documentation on regex in Libreoffice calc/basic.
I am trying to get all results of a matched regex expression. But it only ever returns the first result.
I’m pretty sure I need to specify the global option, but can’t figure out how…

Sub Test
Dim oTextSearch,oOptions,oFound,J,I
Dim sPhone() As String
oTextSearch = CreateUnoService("com.sun.star.util.TextSearch")
oOptions = CreateUnoStruct("com.sun.star.util.SearchOptions")
oOptions.algorithmType = com.sun.star.util.SearchAlgorithms.REGEXP
oOptions.searchString = "[0-9,\\-]{8}"
oTextSearch.setOptions(oOptions)
oFound = oTextSearch.searchForward("h)999-9999 c)888-8888", 0, Len("h)999-9999 c)888-8888"))
J = -1
ReDim sPhone() As String
For I = 1 to oFound.subRegExpressions
	J = J + 1
	ReDim Preserve sPhone(J)
	sPhone(J) = mid("h)999-9999 c)888-8888", oFound.startOffset(I-1)+1, oFound.endOffset(I-1) - oFound.startOffset(I-1))
Next I
End Sub

Edited by @Lupp
Please use the tool Preformatted text (Ctrl+E) for code.
If you want to reject my changes, edit again.

‘’ oOptions.searchString = “[0-9,\-]{8}”
oTextSearch.setOptions(oOptions)
iEnd = 0
oFound = oTextSearch.searchForward(aRange.sPhone, iEnd, Len(aRange.sPhone))
Do While oFound.subRegExpressions > 0
J = J + 1
ReDim Preserve sPhone(J)
sPhone(J) = mid(aRange.sPhone, oFound.startOffset(0)+1, oFound.endOffset(0) - oFound.startOffset(0))
iEnd = iEnd +8
oFound = oTextSearch.searchForward(aRange.sPhone, iEnd, Len(aRange.sPhone))
Loop
“”
Messy, but Works

Well, if you say so, then we have no doubts about your words: it’s really messy, and - perhaps - it even works … Although it’s completely unclear what problem this code is trying to solve, where it looks for phone numbers and where it sends the search result …
The “Solution” mark may be misleading for future readers of this thread

1 Like

If you have a better solution, I’m all ears…

Willingly provide a solution … which of the tasks? For example, for the task “create a list of all substrings matching a regular expression from the text of one cell and display the result as a string with a line break separator” the code might be:

Option Explicit 

Function allSearchResults(sSourceText As String, sPinch As String, Optional bCaseSensitive As Boolean) As String 
Dim oFunctionAccess As Variant 
Dim i As Long 
Dim sDelimiter As String, sIgnoreCase As String 
Dim aParts As Variant 
	oFunctionAccess = CreateUNOService("com.sun.star.sheet.FunctionAccess")
	oFunctionAccess.setPropertyValue("IsArrayFunction", False)
	If IsMissing(bCaseSensitive) Then bCaseSensitive = False
	If bCaseSensitive Then
		sIgnoreCase = "("
	Else 
		sIgnoreCase = "(?i)("
	EndIf 
	sDelimiter = getUniqueDelimiter(sSourceText)
		
	aParts = Split(oFunctionAccess.callFunction("REGEX", Array(sSourceText, sIgnoreCase & sPinch & ")", sDelimiter & "$1" & sDelimiter, "g")), sDelimiter) 
	If IsArray(aParts) Then
		For i = LBound(aParts) To (UBound(aParts)-1)/2
			aParts(i) = aParts(i*2+1)
		Next i
		ReDim Preserve aParts(i-1)
		allSearchResults = Join(aParts, Chr(10))
	Else 
		allSearchResults = ""
	EndIf 
End Function

Function getUniqueDelimiter(sTestString As String) As String 
Rem Find a character or combination of characters that does not exactly occur in the parsed string:
Dim i As Integer 
	getUniqueDelimiter = ""
	If Len(sTestString) = 0 Then Exit Function 
	For i = 1 To 255
		If InStr(1,sTestString, Chr(i)) = 0 Then
			getUniqueDelimiter = Chr(i)
			Exit Function 
		EndIf 
	Next i
	For i = 1 To 255
		If InStr(sTestString, Chr(i)&Chr(i)) = 0 Then
			getUniqueDelimiter = Chr(i)&Chr(i)
			Exit Function 
		EndIf 
	Next i
	For i = 1 To 255
		If InStr(sTestString, Chr(i)&Chr(i)&Chr(i)) = 0 Then
			getUniqueDelimiter = Chr(i)&Chr(i)&Chr(i)
			Exit Function 
		EndIf 
	Next i
End Function

Demo

This is about the code given in the question:
THERE WERE ERRORS. I tried to correct the code. New version posted below.

===

There is a fundamental misunderstanding concerning the TextSearch service:
.searchForward() finds and returns matches one by one.
.subRegExpressions can only be >1 (imo; I didn’t work yet with this variant) if the SearchString was a regular expression with grouping subexpressions using parentheses.
Nevertheless the .startOffset and .endOffset items of the structure SearchResult are always sequences and should be used with the index part (0) if no subexpressions are to be addressed.
Use .searchForward() with a While - Wend loop (or similar) and use the .endOffset of any complete finding to define the .startOffset for the next .findForward() - where the last argument should be unchanged.
In addition the RegEx was bad.
The code below shows a very raw (errors expected?) example of what I mean. It was created reworking the original example by the OQ. (My personal style may not be fully compatible.)

Sub Test
Const exampleString = "h)999-9999 c)888-8888,;;:0123-4567"
Const testSearchString = "[0-9\-]{8}(?=\D)" REM Lookahead assertion used!"
REM Was "[0-9,\\-]{8}" After removal of one "\" this would still simply
REM CUT OFF longer phone numbers. "0123-4567" would be accepted for "0123-456"
srvcTextSearch = CreateUnoService("com.sun.star.util.TextSearch")
struOptions = CreateUnoStruct("com.sun.star.util.SearchOptions")
struOptions.algorithmType = com.sun.star.util.SearchAlgorithms.REGEXP
struOptions.searchString = testSearchString REM Was "[0-9,\\-]{8}"
srvcTextSearch.setOptions(struOptions)
endOffset = Len(exampleString)
nextStartOffset = 0
uFound = -1 REM =Ubound(sequPhoneNs)
While nextStartOffset<endOffset
 nextFinding = srvcTextSearch.searchForward(exampleString, nextStartOffset, endOffset)
 With nextFinding
  matchFound = (.subRegExpressions>0) REM Condition used in my uncorrected post was bad!
  REM Learned thanks to @LeroyG.
  REM His solution is more clearly structured insofar.
  REM With respect to other comments I post this nevertheless.
  If matchFound Then
   nextStartOffset = .EndOffset(0)
   uFound = uFound + 1
   Redim Preserve sequPhoneNs(uFound)
   lenFinding = .EndOffset(0) - .StartOffset(0)
   sequPhoneNs(uFound) = Mid(exampleString, .startOffset(0) + 1, lenFinding)
  Else
   nextStartOffset = endOffset REM This alternative was missing in my original post.
  EndIf
 End With
Wend
End Sub

Ya, searchForward is what was getting me… P.S. The regex only had the double \ because it removed it when I posted the code.

Test this:

Sub getSubRegex
	dim oSearch as object, oFound as object, a&, b&, s$, sPhone$, oParam as new com.sun.star.util.SearchOptions
	s="h)999-9999 c)888-8888"
	oSearch=CreateUnoService("com.sun.star.util.TextSearch")
	with oParam
	  .algorithmType=com.sun.star.util.SearchAlgorithms.REGEXP
	  .searchString="[0-9,\\-]{8}"
	end with
	oSearch.setOptions(oParam)
	oFound=oSearch.searchForward(s, 0, len(s)) 'search the string from Start
	do while oFound.subRegExpressions>0 'expression is found
		a=oFound.startOffset(0) 'start position in string
		b=oFound.endOffset(0) 'end position in string
		sPhone=mid(s, a+1, b-a)
		msgbox sPhone
		oFound=oSearch.searchForward(s, b, len(s)) 'search string from end position of previous found
	loop
End Sub

Here is what I ended up with. Thanks… :slight_smile: And, I finally figured out how to post structured code. hehe

oTextSearch = CreateUnoService(com.sun.star.util.TextSearch)
oOptions = CreateUnoStruct(com.sun.star.util.SearchOptions)
oOptions.algorithmType = com.sun.star.util.SearchAlgorithms.REGEXP
oOptions.searchString = "[0-9,\-]{8}"
oTextSearch.setOptions(oOptions)
J = -1
aRange.sPhone = "h999-9999 c888-8888"
For iStart = 0 to Len(aRange.sPhone) Step 8
	oFound = oTextSearch.searchForward(aRange.sPhone, iStart, Len(aRange.sPhone))
	If oFound.subRegExpressions > 0 Then
		J = J + 1
		ReDim Preserve sPhone(J)
		sPhone(J) = mid(aRange.sPhone, oFound.startOffset(0)+1, oFound.endOffset(0) - oFound.startOffset(0))
	End If
Next iStart

Be careful, For ... Step 8 isn’t safe :slight_smile: → see 1st example or try aRange.sPhone = "h999-9999 7777 c888-8888".
Next problem could be speed, ReDim Preserve for every item isn’t just fast.

Slow (19 seconds on my computer):

Sub Test
	oTextSearch = CreateUnoService("com.sun.star.util.TextSearch")
	oOptions = CreateUnoStruct("com.sun.star.util.SearchOptions")
	oOptions.algorithmType = com.sun.star.util.SearchAlgorithms.REGEXP
	oOptions.searchString = "[0-9,\-]{8}"
	oTextSearch.setOptions(oOptions)
	J = -1
	t=Now
	s="h999-9999 c888-8888"
	for i=1 to 10000
		sLong=sLong & s
	next i
	aRangesPhone=sLong
	'msgbox sLong
	For iStart = 0 to Len(aRangesPhone) Step 8
		oFound = oTextSearch.searchForward(aRangesPhone, iStart, Len(aRangesPhone))
		If oFound.subRegExpressions > 0 Then
			J = J + 1
			ReDim Preserve sPhone(J) 'slow!!!
			sPhone(J) = mid(aRangesPhone, oFound.startOffset(0)+1, oFound.endOffset(0) - oFound.startOffset(0))
		End If
	Next iStart
	t=Now-t
	msgbox( minute(t) & ":" & second(t) ) 'about 19s on my computer
	'xray sPhone 'see index 5, 6, 11, 12 etc. -> duplicities!!!
End Sub



Fast (2s on my computer):

Sub getSubRegex
	dim oSearch as object, oFound as object, a&, b&, s$, sPhone$, oParam as new com.sun.star.util.SearchOptions, sOut$, arr(), sLong$, t as date
	t=Now
	s="h999-9999 c888-8888"
	for i=1 to 10000
		sLong=sLong & s
	next i
	s=sLong
	oSearch=CreateUnoService("com.sun.star.util.TextSearch")
	with oParam
	  .algorithmType=com.sun.star.util.SearchAlgorithms.REGEXP
	  .searchString="[0-9,\-]{8}"
	end with
	oSearch.setOptions(oParam)
	oFound=oSearch.searchForward(s, 0, len(s)) 'search the string from Start
	do while oFound.subRegExpressions>0 'expression is found
		a=oFound.startOffset(0) 'start position in string
		b=oFound.endOffset(0) 'end position in string
		sOut=sOut & mid(s, a+1, b-a) & chr(12) 'much faster than ReDim Preserve!
		oFound=oSearch.searchForward(s, b, len(s)) 'search string from end position of previous found
	loop
	arr=split(sOut, chr(12))
	if ubound(arr)>-1 then
		a=ubound(arr)
		redim preserve arr(a-1)
	end if
	t=Now-t
	msgbox( minute(t) & ":" & second(t) ) 'about 2s on my computer
	'xray arr
End Sub

Thanks for this…

Just curious, if you also compared with Collection:

    Dim arr As New Collection
    ...
    arr.Add n, mid(s, a+1, b-a)
    ...
    For Each s In arr
      ...
    Next
1 Like

Collection seems the fastest

Sub getSubRegexCollection
	dim oSearch as object, oFound as object, a&, b&, s$, oParam as new com.sun.star.util.SearchOptions, sOut$, arr as new collection, sLong$, t as date, n&
	t=Now
	s="h999-9999 c888-8888"
	for i=1 to 10000
		sLong=sLong & s
	next i
	s=sLong
	oSearch=CreateUnoService("com.sun.star.util.TextSearch")
	with oParam
	  .algorithmType=com.sun.star.util.SearchAlgorithms.REGEXP
	  .searchString="[0-9,\-]{8}"
	end with
	oSearch.setOptions(oParam)
	'on local error resume next 'method .add() for Collection does some error sometimes
	oFound=oSearch.searchForward(s, 0, len(s))
	do while oFound.subRegExpressions>0 'expression is found
		a=oFound.startOffset(0) 'start position in string
		b=oFound.endOffset(0) 'end position in string
		arr.add(n, mid(s, a+1, b-a))
		oFound=oSearch.searchForward(s, b, len(s)) 'search string from end position of previous found
	loop
	t=Now-t
	msgbox( minute(t) & ":" & second(t) ) 'about 2s on my computer
End Sub

Indeed, Collection throws an error, when key already exists - which is n in this case, and which is unchanged in the loop (it should be an incrementing unique number).

It shows error when n is unique. Maybe bug in LibreOffice?

Sub getSubRegexCollection
	on local error goto bug
	dim oSearch as object, oFound as object, a&, b&, s$, oParam as new com.sun.star.util.SearchOptions, arr as new collection, n&, ss$
	s="h999-9999 c888-8888 h999-9999 c888-8888"
	oSearch=CreateUnoService("com.sun.star.util.TextSearch")
	with oParam
	  .algorithmType=com.sun.star.util.SearchAlgorithms.REGEXP
	  .searchString="[0-9,\-]{8}"
	end with
	oSearch.setOptions(oParam)
	'on local error resume next 'method .add() for Collection does some error sometimes
	oFound=oSearch.searchForward(s, 0, len(s))
	do while oFound.subRegExpressions>0 'expression is found
		a=oFound.startOffset(0) 'start position in string
		b=oFound.endOffset(0) 'end position in string
		ss=mid(s, a+1, b-a)
		n=n+1
		arr.add(n, ss) 'sometimes ERROR
		oFound=oSearch.searchForward(s, b, len(s)) 'search string from end position of previous found
	loop
	exit sub
bug:
	msgbox("n: " & n & chr(13) & ss, 1, "Error in arr.add())" )
End Sub

When adding an element to a collection, the second parameter (key) must be a unique key.
In your example, this is not the case.

1 Like

@sokol92 thank you! It was me who confused things, based on wrong memory. @KamilLanda: sorry for the confusion!

Since the key is not required, I should had avoided its mention altogether.

@AllOfTheWiseContributors :
I think the type Collection is an object type only available in Basic.
Am I wrong?

@Lupp you are absolutely correct.

Thanks for this great conversation. :slight_smile:
This is what I ended up with (it does miss some obscure country codes I never see i.e. 44-1534):

Sub FindPhoneNumbers()

	Dim stk,sSearch As String
	Dim sPhone As New Collection
	Dim oFound As Object
	Dim iStart As Integer
	
	oTextSearch = CreateUnoService("com.sun.star.util.TextSearch")
	oOptions = CreateUnoStruct("com.sun.star.util.SearchOptions")
	oOptions.algorithmType = com.sun.star.util.SearchAlgorithms.REGEXP
	
	stk = "h933-0101 c4561234 (306) 213-3333 3061112222 some text +1(306)214-8789 (some more text)(305)222-6666 c)1-403-333-1111 d:1(222)333-4444 306-777-8888"
  
	sSearch = "([0-9,\-]{1,3}[0-9,\-]{4}[0-9,\-]{4}[0-9]{4})" & _
	 "|([0-9,\-]{4}[0-9,\-]{4}[0-9]{4})" & _
	 "|(\([0-9]{3}\)\ {1}[0-9,\-]{4}[0-9]{4})" & _
	 "|(\([0-9]{3}\)[0-9,\-]{4}[0-9]{4})" & _
	 "|([0-9]{3}\-[0-9]{4})" & _
	 "|([0-9]{10})" & _
	 "|([0-9]{7})"
	oOptions.searchString = sSearch
	oTextSearch.setOptions(oOptions)

	iStart = 0
	While iStart < Len(stk)
		oFound = oTextSearch.searchForward(stk, iStart, Len(stk))
		If oFound.subRegExpressions > 0 Then
			sFound = mid(stk, oFound.startOffset(0)+1, oFound.endOffset(0) - oFound.startOffset(0))
			sPhone.Add(sFound)
			iStart = oFound.endOffset(0)
		Else 
			iStart = Len(stk)
		End If
	Wend
	For I = 1 To sPhone.Count
		Msgbox sPhone.Item(I)
	Next
	
End Sub