Ask Your Question
0

How can I count the number of occurences of individual Chinese characters in a document?

asked 2015-04-02 08:33:20 +0200

this post is marked as community wiki

This post is a wiki. Anyone with karma >75 is welcome to improve it.

I want to count the number of occurrences of each separate word or Chinese character in a document. The words have already been sorted into groups. I would like a word count of the words within that group. I am using LibreOffice (can be Writer or Calc). I would prefer not to do this in terminal mode. For example: the = 12, with = 5, take = 2.

edit retag flag offensive close merge delete

5 Answers

Sort by » oldest newest most voted
0

answered 2015-04-02 09:44:30 +0200

ROSt52 gravatar image

Writer displays Chinese characters as words on the bottom of the Writer window for the entire text or for a text selection. Selection has priority

Alternatively you can use Tools > Word Count. When using this function you can the number of a selection of the text and the entire text the same time.

Be careful and understand how Arabic numbers and symbols like are counted.

edit flag offensive delete link more

Comments

Thank you, however I am not looking for the total count, but the individual count for each word. For example, how many times does apple occur, and how many times does orange occur, and so on.

rahalver gravatar imagerahalver ( 2015-04-02 10:02:28 +0200 )edit

Then follow the answer of @karolus

ROSt52 gravatar imageROSt52 ( 2015-04-04 10:57:55 +0200 )edit

I can't seem to find out where this answer is, on Apr 4, by @karolus -- please clarify how to find it.

rahalver gravatar imagerahalver ( 2015-04-14 07:00:16 +0200 )edit
0

answered 2016-06-30 00:50:06 +0200

cosmeticdavid gravatar image

The above answers do not really answer the question. Clearly, this feature is missing and should be added.

I have found a workaround to use in the mean while. Use the Search and Replace menu instead. Then click Replace All. When this is done, a popup box will say that the phrase was replaced X times, giving you the count. After this, you undo the replacement by CTRL+Z/CMD+Z.

edit flag offensive delete link more

Comments

If you replace the found specimen of the 'Search For' with exactly the same text you won't need the 'Undo'. Using RegEx the special character & is used in 'Replace With' with the meaning "whatever was found".

Lupp gravatar imageLupp ( 2016-06-30 01:57:43 +0200 )edit
0

answered 2016-06-30 02:15:38 +0200

Lupp gravatar image

updated 2016-06-30 02:19:25 +0200

This is about counting in Calc.
Assuming one of the "groups" mentioned in the original question is contained in A1, and in B1 a character, a word or a phrase to search for, then the formula

=SUMPRODUCT(MID(A1;ROW(OFFSET(INDIRECT("$A$1");0;0;LEN(A1);1));LEN(B1))=B1)
will return the number of occurrences. Overlapping occurences will be counted each. "Words" occurring inside words consisting of more characters will also be counted. Knowing nothing about Chinese I cannot tell if this will be acceptable. Using RegEx and a construct with SEARCH instead of the direct comparison of strings, additional control of the counting may be achieved.
The construct OFFSET(INDIRECT("$A$1");0; ... is used to avoid errors caused by lost references after deletion of a row or a column in specific cases. Omitting this, INDIRECT("$A$1") can be replaced by $A$1.
The content of any cell is limited to a maximum of 65535 characters.

edit flag offensive delete link more

Comments

Hi @Lupp, if I'm not wrong, Chinese is DBCS text (double byte), and for those in calc there are some special functions like MIDB() LEFTB() RIGHTB() LENB().

m.a.riosv gravatar imagem.a.riosv ( 2016-06-30 13:42:43 +0200 )edit

Might someone supply an example file? I think, a solution based on my recent suggestion will work anyway, marginal adaption possibly needed. Of course I do not oppose to any more efficient solution based on an extension / add-in or on specialised functions. (If I had the problem, I surely considered a solution based on a general-purpose language/IDE.)

Lupp gravatar imageLupp ( 2016-06-30 14:30:19 +0200 )edit
0

answered 2016-06-30 07:35:05 +0200

pierre-yves samyn gravatar image

Hi

I do not know for Chinese characters but for separate words you can try the linguist extension:

  • Download then
  • ToolsExtension ManagerAdd▸select from you download folder
  • Quit LibreOffice.

The extension is deployed the next launch. New menu Linguist added in Writer.

Regards

edit flag offensive delete link more
0

answered 2015-04-02 10:49:17 +0200

karolus gravatar image

Hallo

What about →Edit→Search&replace →search for ... →find all

edit flag offensive delete link more

Comments

I want to find the frequency of many different characters. I would have to type each one individually this way. Any way to count the frequency of different characters all at the same time? In English or Roman letters, this would be the same as finding the frequency of A, B, C, and so on, to Z.

rahalver gravatar imagerahalver ( 2015-04-14 06:57:15 +0200 )edit

inside the Searchfield concatenate the Chinese-symbols similar to apple|orange|A|B|C and click more options→[x]regular Expression

but IHMO for detailed Report for every Word|Symbol you need some Kind of Makro.

karolus gravatar imagekarolus ( 2015-04-14 07:57:17 +0200 )edit
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2015-04-02 08:33:20 +0200

Seen: 4,256 times

Last updated: Jun 30 '16