# Sorting words that need to go into 2 columns in doc or calc but are'nt separated by comma between greek and english translation

Hi am using Anki to learn language and have taken a scan of some vocab and it appears : τα αγγλικά English ( language ) ακόμα also , in addition τα γαλλικά French ( language ) τα γερμανικά German ( language ) το γραφείο desk μας ( to ) us το μπαλκόνι balcony το μπάνιο bathroom τα νομικά law ( studies ) η ντουλάπα cupboard γράφω I write δέκα ten δέκατος - η - ο tenth πανω σε on πολλοί - ές - ά many , a lot of η πολυθρόνα arm chair το δωματιο room a few scan errors dont worry me but there is no comma between the greek letters/words but is a comma after the english translation. I can use the simple manual pressing enter to get this into a single column by insert cursor in front of the greek words some with the article greek letter ο η το this gets the greek and english combination into a single column e.g using first pair- τα αγγλικἀ English (language) and so on then use the equivalent of transfer of data but have to manually insert a comma after the greek word of eg ... αγγλικά before doing that, thus end up with 2 columns of words 1 greek and 1 english but then I remover the , after the english word as the comma after the greek is removed in the transfer to 2 columns but is there a suggested formula for doing this please as I have a lot of vocab from my book to sort like this and the bit I have pasted above is a small example Regards Nigel UK

edit retag close merge delete

That's really hard to read. Finally ... and the question is? Otherwise this could be only a report you share with the audience.

( 2019-04-14 12:26:02 +0200 )edit

When you said that you have 'taken an Anki scan' what do you mean by this. What format is the file? I would think that you would be better using Calc as a spreadsheet and importing the file as a .CSV, if this is possible using the comma as a separator. This would get you started. Have your tried the internet for advice on using Anki data?

( 2019-04-14 13:23:26 +0200 )edit

Finally ... and the question is?

Oh, I have been finally able to find something like a question there :-D :

but is there a suggested formula for doing this please as I have a lot of vocab from my book to sort like this and the bit I have pasted above is a small example

... but it took hard time, and finding the question didn't allow me to understand the issue actually. A person obviously has software having the same problem as OP has: both OP and the software OP uses don't suspect there are commas and other punctuation marks in existence.

( 2019-04-14 14:04:04 +0200 )edit

I meant that I had scanned in with a iphone photo app that turned the photo of the text in the book into "text file" that I pasted into a doc/text file and then tried to get into columns but ended up manually putting in commas between the greek letters and the english and separating manually. This was taking too long and with 2500 words potentially a long exercise.

( 2019-04-14 18:22:19 +0200 )edit

I was using the simple data formula but thought that there might be a better way to sort the jumble of greek and english letter. For example I can from my text file created from a text scan app get from the snippet above:- "το γραφείο desk" i.e the greek word for desk is given first-thus I then manually inserted a comma between greek word and the word for desk and was able to then sort into 2 columns - 1 with greek comb of το γραφείο and 2nd column lined up with each greek word is the english word.
The text file created from the scan reader accurately shows the commas between some of the english words where the greek word can have more than 1 meaning so I then had to manually remove those commas and put in a singly comma between the greek and english words. I ...(more)

( 2019-04-14 18:34:48 +0200 )edit

A formular will only be able to seperate and process a data set with defined structure and criteria for field, field seperator and end of data set. If your data contains commas, commas are disqualified as field seperator (except you parenthesise your field with lets say quotation marks). Particularly as the count of commas varies from data set to data set. Choose something else, e.g. semicolon as field seperator. You'll need something to mark the end of the data set, e.g. # in case there's no CR/LF. I'd assume that this could be done in three steps. First the "export" (really no other way than photo and ocr process? weird ..), second preparing the ascii file, third import ascii as csv file and processing the strings with "formula" in LO calc: splitting them into seperate columns.

But I'm still not able to extract a simple ...(more)

( 2019-04-14 20:57:09 +0200 )edit

In the meanwhile, note that regular expressions allow to use Unicode character properties, like "being Letter" or "Being Greek script", and even combine thise proprieties to build complex expressions. Possibly something to look into here.

( 2019-04-14 21:08:40 +0200 )edit

thanks for ideas

( 2019-04-15 11:58:43 +0200 )edit

I will look at that and get my wife input as she is good at German. Thanks for updates

( 2019-04-15 11:59:52 +0200 )edit

Sort by » oldest newest most voted

Based on what @Mike Kaganski already told.
For starting: See attachment.
Concerning more sophisticated extended solutions bug tdf#76481 may be annoying.

(This is not helpful if the script of both languages is :Latin: . However, there was this related question in the German /de branch where one of the languages was represented in Italic style. This would allow a similar solution.)

===EDIT1 2019-04-16 13:13 UTC===
Attachment announced in my sixth comment below.
The solution therein can only work in LibO V6.2 or higher because it makes use of the newly implmented Calc function REGEX(). (Thanks to erAck.)

more

I thought that the "find replace" solution was going to be best but I carefully put the function line of ([:script=Greek:])([:script=Latin:]) in the "Find" box but then I cannot read what is in the Replace - is it S1|S2 or is it a dollar $sign? It is very faint but I cannot get it to work as I am possibly having an issue with the lettering albeit that would be odd as I carefully used the attachment odt doc kindly provided so if you have a moment could you confirm what I am doing as I could not find the particular script term in the UCI lists. I also had a go with the German Cattribs2Html function after I remembered to switch off macro security and got it to copy the text into the HTML column but it would not work in the TRIM columns and just ...(more) ( 2019-04-15 14:11:51 +0200 )edit $1| $2 Dollar1 Pipe(vertical line) Space Dollar2 You also missed the space behind Greek:] You didn't mention your version of LibO. Newer features of the (ICU) RegEx engine may only work in recent versions of LibO. Never "switch off" macro security! You may choose medium level and permit macros from a trusted source. But you shouldn't run macros if there isn't even a basic understanding for them. There was no "German function", only a link to a forum in German language. My hint leading you to an example containing a macro function of mine wasn't ment to help in your specific case. It was "just for completeness". The script language isn't treated by that function and has no standard HTML representation afaik. The #VALUE! error you got was due to the fact that there wasn't a single Italic (slanted) letter in the text. ( 2019-04-15 18:20:23 +0200 )edit Hi I have got macro set for medium and that was just a loose expression on my part but thanks. I had some limited results following your "space" correction highlighting my poor glasses! My find: expression has been corrected to this:- ([:script=Greek: ])([:script=Latin:]) and in Replace$1| \$2 I tried both 1 space and 2 spaces after the pipe or vertical line entry BUT each time I got this and a limited run the result is as follows; The pipe line is going in the wrong place and I am at a loss to know what to change as my greek words are typed in Greek letter on Polytonic and I am using LO 6.2.2.2-64 on MS Windows10 άλλος other Γενικάg| e nerally δικό τουςt| h eirs μικροβιολογικός microbiological τα έπιπλα furniture το εργαστή laboratory το ισόγειοg| r ound floor μοντέρνος modern ξεχωριστόςs| e parate ...(more)

( 2019-04-15 19:27:44 +0200 )edit

The space needs to be behind the closing square bracket.
If you cannot assure that there is exactly one space between the last Greek word and the first English word, you need to use + (space followed by +) instead of the single space.
If you don't learn about the basics concerning regular expressions you might better not use them.

( 2019-04-15 20:06:38 +0200 )edit

OK but where is the best page link to learn about the spacing as it appears to me that I might be getting the spaces in wrong order as follows ([:script=Greek: ])([:script=Latin:]) as I was advised to leave a space "in front of the square bracket of the first part of the expression ie ([:script=Greek: ])? You refer to exactly one space and ensuring that there is one space but that means surely back to square 1 and manually going through and altering or checking spaces on each row of text ????

( 2019-04-16 07:22:31 +0200 )edit

Hi I got up early and "cleaned my glasses"- don't know where it went wrong but I GOT IT TOO WORK and have to thank you guys for your patience! Anyway I would still like to know where I should start on learning the basics on making such entries as ([:script=Greek:] )([:script=Latin:]) -note I got the space after ]. As I got another issue but that one is for me to resolve as I can hear the sound of stampede- oh no not him again! No seriously thanks for this support and I have now taken out annual subs to LO.

( 2019-04-16 08:18:37 +0200 )edit

I am not an expert in Regex. If I need to find something I'm not sure about, I mostly start with this link. It leads to a very "complete" guiide to Regex, and it sometimes isn't easy to find the right page.
LibreOffice makes use of a free and open third-party Regex engine by ICU. @Mike Kaganski already pointed to one of their root page on Regex in his second comment on the original question, but it isn't exactly a tutorial.
There are many tutorials about Regex in the web and you may find one suiting you better. Anyway you need to understand that there are different "dialects" and often more than one way to appeal to the same functionality. To restrict a part of a search expression to characters in Greek script, e.g, you may write [:script=Greek:]+ reminding you of the concept of character ...(more)

( 2019-04-16 13:37:32 +0200 )edit

A specific Regex engine may support both ways or just one of them (and probably even none of them). In the case under discussion ICU supports both approaches.
If you want to understand Regex in depth you should be ready to spend a substantial amount of time on learning and experimenting. Have a lot of fun!
(Historically RegEx is an invention by mathematicians / theoreticians on formal languages. It was extended, enhanced, and partly specialized aiming at the usage we have to cope with here.)

( 2019-04-16 13:45:24 +0200 )edit

Thanks for this very helpful link that I have had a look at. I realise that I cannot expect answers on plate and as I pointed/confirmed I managed to get it to work so am grateful for all that commented. I was possibly tired as when I retried early this morning it worked and I even managed to sort out the clump of text that my OCR scanner produces via my Iphone. In other words I can auto insert the pipe delimiter and then press enter on each greek new word and get it to a single column that is then easy with pipe line to sort. I had previously spent time endeavouring to find app/ program/software that would recognise the greek letters and preserve the columns in my book of Greek course I am learning but the only 1 find was what I ended up with. I ...(more)

( 2019-04-16 14:18:24 +0200 )edit

You may send a typical one of your photographs to the email account you should find in my user info. I would probably play with it to find a good way in your sense. However, I cannot spend much time on it at the moment.

( 2019-04-16 14:29:36 +0200 )edit

Sorry I was waffling away and did not get to the point but your suggestions and examples of the greek language find etc were excellent. As a surveyor now retired I am getting into bad habits, waffling away and I sometimes dont get to the issues in my style of writing and can appreciate the frustration. One important issue is for me to make damn sure I make a "£" contribution as I have only just started using this suite given the fiasco with subscription only Microsoft products on a pension and reviewing your own website and that of LibreOffice the support and network is superb. Many Regards Nige PS I will get on to the Unicode or csv stuff a little later as I got enough trouble learning Greek at moment and coping with retirement when there are sometimes just too many things to address - its taken me a year or so to get used to being retired

more

( 2019-04-15 19:40:00 +0200 )edit

## Stats

Seen: 50 times

Last updated: 2 days ago