text search in multi text cells

luber1 · October 17, 2020, 12:24pm

Sample text:

fred, bill Agnes joy sofi andy  <|> wen bill OJ <|> krig huff Bill plwer, Snuffer
sin Trent, fordy summy, oka  <|> jin huter Bill
dred dir
orka bill, lorry
bill dass
erta
pop lanmc

10 cells, find the total number of Bills, case insensitive Untitled 1.ods

Found out and uploaded file. Interesting suggestion from Michel but doesn’t work, however, counta runs but gives wrong answer

201019 attached spreadsheet shows various attempts, the cell with 13 is a COUNTA version but the correct answer is 8. Only bill Bill bill, Bill, should be picked up, not embedded bill like hillbilly [good point Lupp]

I wasn’t aware that LO had differently named commands in different culture languages [I’m Czech, Australian citizen, knowledge of French, German, Swedish, Japanese is seriously bad but entschuldige is a start

(Edited to format and see shared file.)

mgl · October 17, 2020, 12:31pm

Hello @luber

Please edit your question and attach the spreadsheet.

Kind regards, Michel

anon73440385 · October 17, 2020, 12:57pm

Thanks for clearly describing what <|> and , means.

luber1 · October 17, 2020, 1:40pm

Sorry, can’t paste a spreadsheet segment <|> is a cell break , is just a random comma. Probably 10 cells etc is likely to be easiest while I make up a new spreadsheet. Thanks for the great ideas. Lu

Lupp · October 18, 2020, 12:58pm

Should “hillbilly” be found as an occurrence of “Bill”? If not, in what way must the substrings to find be delimited?
Can you assure that the substrings to search for not contain any reserved characters concerning RegEx?
Information of this kind is substantial if you want to get suggested reliable solutions.

mgl · October 18, 2020, 2:02pm

Hello @luber1

Could you be also more specific about what went wrong ? Edit your question and provide the data set, the related result and the expected result.

LeroyG · October 18, 2020, 10:32pm

@luber1, Don’t forget to check the mark () to the left of the answer that solves your question.

mgl · October 18, 2020, 10:37pm

Hello @luber1

In the newly appended file, the formula close to the solution is in F3=COUNTIF(B2:E9;".*(B|b)ill.* " ). Just remove the two white spaces : the first one between final * and " and the second one between " and )

The right formula is =COUNTIF(B2:E9;".*(B|b)ill.*")

Kind regards, Michel

Lupp · October 18, 2020, 11:34pm

The newly appended file actually was made with the “wildcard” setting under
>Tools>Options>LibreOffice Calc>Calculate>>Formulae wildcards
It must be changed to Enable regular expressins in formulae.
(Wildcards are by far not powerful enough to do real work. They are just promoted by MS.)
With sufficiently recent versions you may use the switch (?i) to make the search case insensitive independent of the global setting. As opposed to the already suggested way this will apply, however, not to just one character, but to all characters following, as long as not a (?-i) switches back.

You may also take the opportunity to learn about Regex, and about a few additional techniques the playful way. I will post a respective attachment Imade “just for fun” to an otherwise probably obsolete answer.

PKG · October 25, 2020, 2:59pm

Formula in F2 =SUMPRODUCT(IF(ISNUMBER(SEARCH("bill";B2:E9))))

Untitled 2.ods

EDIT 2020-10-26 08:40:00

This will also count compoundwords
like “hillbilly”.

Then a correction to the formula : =SUMPRODUCT(IF(ISNUMBER(SEARCH("\<bill\>";B2:E9))))

or a new formula: =COUNTIF(B2:E9;".*\<bill\>.*")

igorlius · October 25, 2020, 4:43pm

Nice oneliner, just keep its limitations in mind:

This will only count one accurance of the word bill for each cell.
This will also count compoundwords like “hillbilly”.

Still a nice and crisp oneliner. I’ll give it an upvote.

newbie-02 · October 25, 2020, 6:24pm

like kiss solutions upvoting

igorlius · October 26, 2020, 10:33am

The Statements

=SUMPRODUCT(IF(ISNUMBER(SEARCH("\<bill\>";B2:E9))))
=COUNTIF(B2:E9;".*\<bill\>.*")

both give me Zero results.
Is there a trick to this?

PKG · October 26, 2020, 11:00am

Is the following set under Tools → Options → LibreOffice Calc → Calculate?

image description

igorlius · October 26, 2020, 11:16am

Indeed, that is what i was missing. Thanks.
And other might also be glad for this hint.
That only leaves the one count per cell limitation.

luber1 · October 27, 2020, 11:43am

Great, thank you Gentlefolk. With “enable regular expressions in formulas” both formulas worked [although I got some weird answers but closed and reopened and retoggled “Enable…” both worked and in a different sheet but only 1 bill per cell which was OK for me anyway

mgl · October 17, 2020, 12:40pm

Hello @luber1

I try a proposal : use COUNTIF with a regular expression, like =COUNTIF(A3:A12;"=.*(B|b)ill.*") if the source data are in range A3:A12

.* looks for any number of any kind of characters

(B|b) allows either choice between B or b

Allow for regular expressions to be used in formulas by Alt+F12>LibreOffice Calc> Calculate> Enable regular expressions in formulas

Complement to my answer

Based on my own confusion between COUNTA and COUNTIF as, in French, they are translated respectively NBVAL and NB.SI, which do not look exactly the same but close by, @luber1, if your working language is not English, in order to catch the right function name in your own language, transit temporarily by English. Close all other LO documents. Open a new blank spreadsheet. Activate the option Alt+F12>LibreOffice Calc> Formula>Use English function names>OK. Then type in a set of data and two formulas, one with COUNTA(...) and the other one with COUNTIF(...). Save the spreadsheet. De-activate the option for Use of English function names. Re-open the spreadsheet and compare the names of both functions in you own language.

Hope this helps too.

To show the community that the question has found its answer, please click on the ✓ aside the top of the correct answer and vote by clicking the ^ caret of all usefull answer.

Please do not use Add Answer but edit your original question to enhance the details of your question (answers are reserved for solutions to a problem on this Q&A site). Thanks in advance …

Kind regards, Michel

LeroyG · October 18, 2020, 10:29pm

@mgl, If do you share a sample file with the formula, it will be automatically translated to the user language.

Lupp · October 18, 2020, 11:47pm

RegEx are a very powerful means for searching and manipulating strings. Concerning the usage in Calc this power was significantly improved by the implementation of the REGEX() function since version 6.2.
You may learn about ways to use RegEx for purposes like addressed by the question in a playful way using the attached spreadsheet.
ask271841FindAndCountStringsInCellTexts.ods

The Regex engine used by LibreOffice is the one by ICU.
A very recommendable site concerning RegEx is (imo) Regular Expression Tutorial - Learn How to Use Regular Expressions.

igorlius · October 18, 2020, 7:23pm

Just 4fun,

here is a macro function solution. Just place it into Tools > Macros … > Edit Macros
and you can call it like any other calc function via:

=CMIR("Bill",B2:D10)

1st argument is the string to search for (case insensitiv)
2nd is a selected cell range
Return value is the count of found matches

Here the macro function code:

rem (c)ount (m)atches (i)n  (r)ange
function CMIR(needle as string, data() as array) as integer
	count=0
	for i = lbound(data,1)to ubound(data,1) 
		for j = lbound(data,2)to ubound(data,2)
			parts() = Split(lcase(data(i,j)), lcase(needle))
			blub = UBound(parts()) - LBound(parts())
			if ( blub > 0 ) then
				count=count+blub
			endif
		next 
	next 
	CMIR=count
end function

— update —

As promised, here is the regex version (2v1) had to fix some bugs sorry

usage:

=CMIR2("\b[B|b][Ii][Ll][Ll]\b",B2:D9)

code:

rem (c)ount (m)atches (i)n (r)ange version2v1
function CMIR2(needle as string, data() as array) as integer
  Dim oTextSearch               ' TextSearch service.
  Dim sStrToSearch As String    ' String to search.
  Dim aSearchResult             ' com.sun.star.util.SearchResult
  Dim aSrcOpt As New com.sun.star.util.SearchOptions
  Dim enLocale As New com.sun.star.lang.Locale
  Dim rank As Long
  Dim count as long
  enLocale.Language = "en"
  enLocale.Country = "US"
  oTextSearch = CreateUnoService("com.sun.star.util.TextSearch")
  With aSrcOpt
    'https://api.libreoffice.org/docs/idl/ref/SearchFlags_8idl.html
    .searchFlag = com.sun.star.util.SearchFlags.REG_EXTENDED
    .Locale = enLocale
    'Supports ABSOLUTE, REGEXP, and APPROXIMATE
    .algorithmType = com.sun.star.util.SearchAlgorithms.REGEXP
    .searchString = needle 'rem define word bounds
    '.transliterateFlags = com.sun.star.i18n.TransliterationModulesNew.UPPERCASE_LOWERCASE
  End With
    oTextSearch.setOptions(aSrcOpt)
	count=0
	rank = 0
	for i = lbound(data,1)to ubound(data,1) 
		for j = lbound(data,2)to ubound(data,2)
			  sStrToSearch = data(i,j)
			   aSearchResult = oTextSearch.searchForward(sStrToSearch, 0, Len(sStrToSearch)  )
				rank=0
					Do While aSearchResult.subRegExpressions > 0
						count=count+1
						sStrToSearch = mid(sStrToSearch, aSearchResult.endOffset(0) +1 , Len(sStrToSearch))
						aSearchResult = oTextSearch.searchForward(sStrToSearch, 0, Len(sStrToSearch)  )					    
				  Loop
		next 
	next 
	CMIR2=count
end function

This Function can handel the hillbilly problem very gracefully. By using word boundarys \b

Hope it helps (even more)