Finding word frequency in Calc

Newbie here. I have a column of cells with text phrases in Calc I want to graph. First I want to show the frequency of appearance aka how often do “shoe”, “toaster oven” and “race car” show up. I’d also like to make this information in percentages: what percentage of the cells have “shoe” in them.

There are more ideas I have of using the text phrases, but this would be a start.

The tricky part about using the function Find is that if the text isn’t in the list, then it returns an error code.

Error codes can’t be calculated with for most functions, but the iserror function is just what we need to get around this.

So use something like this (using c2 as the cell where the result of Find is)

=if(iserror(c2),0,1)

This will give you a 1 if the text is there and a 0 if there is an error, but since it only gives an error if the text wasn’t found this will help us count the number of words.

And then sum them up.

If the same word can appear twice in a cell and you need to calculate both, then this is only the beginning.

You can use the FIND function to locate a string within a sentence. E.g. FIND(“left”,A1) if “My left foot” is in A1 will return 4 (position where word left starts)

Convert the result to binary with IF(…>0,1,0) and then just SUM for each word/expression.

Maybe there are more elegant ways to do this is Calc…

(I’m adding a samplefile so that it is archived with the answer: wordcount.ods)

To clarify, I have a column where each cell has two-to three words in it. Is there no way in Calc to search a column for how many times a word shows up? For example I want to look for “shoe” in cells G1 to G118. FIND(“shoe”,G3) doesn’t work even though “shoe” is in G3.

I was giving you pointers instead of doing your work :wink: If FIND is returning error 501 it means that you should use a semi colon ; instead of the comma , FIND only searches one cell so you need to create a second column with IF and FIND or FIND in one column and IF on the third

I think I understand what you’re saying, but I still feel uncertain. Do you need a second column or can it just be a second cell with IF and FIND? I did get FIND(“shoe”,G3) to work this time. Not sure what I did wrong before.

Here is a sample file for you

I see. I know what I’ll be do now. Thanks. :slight_smile:

As user MegaTallDave said “this is only the beginning”. If you seriously want to analyze the sentences then you should use an appropriate software. Just search the web for “text analysis” or “text analysis software” and a new world will open before your eyes :wink:

This is a recurring issue with spreadsheets for me.
It’s nice to see it answered so beautifully.

Information in the LibreOffice help.

COUNTIF

Returns the number of cells that meet with certain criteria within a cell range.
The search supports regular expressions. You can enter “all.*”, for example to find the first location of “all” followed by any characters. If you want to search for a text that is also a regular expression, you must precede every character with a \ character.
You can switch the automatic evaluation of regular expression on and off in Tools - Options - LibreOffice Calc - Calculate.

Syntax
COUNTIF(Range; Criteria)

Range is the range to which the criteria are to be applied.

Criteria indicates the criteria in the form of a number, an expression or a character string. These criteria determine which cells are counted. You may also enter a search text in the form of a regular expression, e.g. b.* for all words that begin with b. You may also indicate a cell range that contains the search criterion. If you search for literal text, enclose the text in double quotes.

Example

A1:A10 is a cell range containing the numbers 2000 to 2009. Cell B1 contains the number 2006. In cell B2, you enter a formula:

=COUNTIF(A1:A10;2006) - this returns 1

=COUNTIF(A1:A10;B1) - this returns 1

=COUNTIF(A1:A10;">=2006") - this returns 4

=COUNTIF(A1:A10;"<"&B1) - when B1 contains 2006, this returns 6

=COUNTIF(A1:A10;C2) where cell C2 contains the text >2006 counts the number of cells
in the range A1:A10 which are >2006

To count only negative numbers: =COUNTIF(A1:A10;"<0")

When I try =COUNTIF(G2:G118;"\s\h\o\e") no answer is given. All text are general expressions. Nothing happens with just =COUNTIF(G2:G118;“shoe”) either.

When I try =COUNTIF(G2:G118;"\s\h\o\e") no answer is given. All text are general expressions. Nothing happens with just =COUNTIF(G2:G118;“shoe”) either.

I think COUNTIF doesn’t work because you are looking for a word in a cell that contains a sentence. It could count cells if each contained a single word…

If that’s the only way I can get it to work I guess I’ll be removing spaces from each cell.

You don’t need to remove spaces :slight_smile: You can separate a sentence in words by using the option Data, Text to Columns, Separated by Space

Was that supposed to remove all but the first word? It would be better to have all the words meshed together.