Ask Your Question
0

how do I find number of specific instances of a specific word in a large document

asked 2016-03-03 19:40:30 +0200

iamlocutus gravatar image

I am attempting to see how many times an author uses certain specific words in his book. it is a vocabulary analysis. The "find" function reports where they are located, but provides no total count of them.

edit retag flag offensive close merge delete

3 Answers

Sort by » oldest newest most voted
1

answered 2016-03-03 22:17:56 +0200

karolus gravatar image

updated 2016-03-04 00:17:45 +0200

Hallo

→Edit→Find and replace ...→findall followed by →Tools→Word count.. should exactly fit to your needs

yes - of course - the following lines of code are completly out of scope that question, but just for fun some python-code to make the whole number of each unique word statistic for a given writer-doc and stores the output sorted by most common-words first into a new calc-document. (take the challenge my dear basic-guys ;-) )

from collections import Counter

def word_stats():
    doc = XSCRIPTCONTEXT.getDocument()
    desktop = XSCRIPTCONTEXT.getDesktop()
    load = desktop.loadComponentFromURL
    text = doc.Text.String
    out = Counter(text.split())
    out = sorted(out.items(), key=lambda x: x[1], reverse=True)
    outdoc = load("private:factory/scalc",
                  "_blanc",
                  0,
                  (),)

    sheet = outdoc.Sheets.getByIndex(0)
    outrange = sheet.getCellRangeByPosition(0,
                                            1,
                                            len(out[0])-1,
                                            len(out))
    outrange.setFormulaArray(tuple(out))
edit flag offensive delete link more
0

answered 2016-03-04 09:06:35 +0200

pierre-yves samyn gravatar image

Hi

if your goal is the analysis of the vocabulary you will find perhaps interesting this extension.

Regards

edit flag offensive delete link more
0

answered 2016-03-03 21:59:36 +0200

Lupp gravatar image

updated 2016-03-03 22:43:55 +0200

Use 'F & R' with RegEx.
'Search For:' (\bThisword\b) where Thisword is the literal word you want to count the occurrences of independent of the letter case.
'Replace With:' $1 which will replace the found word with exactly itself.

'Replace All' will run the replacement not actually changing anything - and will output how often it replaced an occurrence.

Warning! I just tried again this method and it was broken. The $1 meaning the found occurence of the word and using it for the replacement was inserted as a literal wrongly. Do not apply this! At least apply Ctrl+Z action immediately after the 'F & R'.

edit flag offensive delete link more

Comments

@Lupp

What - do you think - is the purpose of the button find all ?

karolus gravatar imagekarolus ( 2016-03-03 22:25:28 +0200 )edit

I thought it was to find all the occurrences. However I was not aware of the way it is showing the number of occurrences now in the status bar. Just found it. Did you tell me and iamlocutus? If so: Thanks! My 'Replace All' always showed the number in a message box as I liked it.

Lupp gravatar imageLupp ( 2016-03-03 22:34:48 +0200 )edit
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2016-03-03 19:40:30 +0200

Seen: 701 times

Last updated: Mar 04 '16