Sorting words that need to go into 2 columns in doc or calc but aren't separated by comma between greek and english translation

goonhilly · April 15, 2019, 12:11pm

I thought that the “find replace” solution was going to be best but I carefully put the function line of ([:script=Greek:])([:script=Latin:]) in the “Find” box but then I cannot read what is in the Replace - is it S1|S2 or is it a dollar $ sign?
It is very faint but I cannot get it to work as I am possibly having an issue with the lettering albeit that would be odd as I carefully used the attachment odt doc kindly provided so if you have a moment could you confirm what I am doing as I could not find the particular script term in the UCI lists.

I also had a go with the German Cattribs2Html function after I remembered to switch off macro security and got it to copy the text into the HTML column but it would not work in the TRIM columns and just got a “Value!” warning so again thought it maybe something to do with the greek letters I am typing as I use Greek Polytonic Keyboard Switch from ENG UK?

My view is that the Find & R is the best for me as it ignores other char. But some help…

Lupp · April 15, 2019, 4:20pm

$1| $2 Dollar1 Pipe(vertical line) Space Dollar2
You also missed the space behind Greek:]
You didn’t mention your version of LibO. Newer features of the (ICU) RegEx engine may only work in recent versions of LibO.
Never “switch off” macro security! You may choose medium level and permit macros from a trusted source. But you shouldn’t run macros if there isn’t even a basic understanding for them.
There was no “German function”, only a link to a forum in German language.
My hint leading you to an example containing a macro function of mine wasn’t ment to help in your specific case. It was “just for completeness”. The script language isn’t treated by that function and has no standard HTML representation afaik. The #VALUE! error you got was due to the fact that there wasn’t a single Italic (slanted) letter in the text.

goonhilly · April 15, 2019, 5:27pm

Hi I have got macro set for medium and that was just a loose expression on my part but thanks.
I had some limited results following your “space” correction highlighting my poor glasses!
My find: expression has been corrected to this:- ([:script=Greek: ])([:script=Latin:]) and in Replace $1| $2
I tried both 1 space and 2 spaces after the pipe or vertical line entry BUT each time I got this and a limited run the result is as follows;
The pipe line is going in the wrong place and I am at a loss to know what to change as my greek words are typed in Greek letter on Polytonic and I am using LO 6.2.2.2-64 on MS Windows10
άλλος other
Γενικάg| e nerally
δικό τουςt| h eirs
μικροβιολογικός microbiological
τα έπιπλα furniture
το εργαστή laboratory
το ισόγειοg| r ound floor
μοντέρνος modern
ξεχωριστόςs| e parate
όπου wherei| n which
κάθεe| a che| v ery

I get gaps and pipe going in after the first English letter or missing completely?
Reg Exp and Dia Sen both ticked can you see any

Lupp · April 15, 2019, 6:06pm

The space needs to be behind the closing square bracket.
If you cannot assure that there is exactly one space between the last Greek word and the first English word, you need to use + (space followed by +) instead of the single space.
If you don’t learn about the basics concerning regular expressions you might better not use them.

goonhilly · April 16, 2019, 5:22am

OK but where is the best page link to learn about the spacing as it appears to me that I might be getting the spaces in wrong order as follows ([:script=Greek: ])([:script=Latin:]) as I was advised to leave a space "in front of the square bracket of the first part of the expression ie ([:script=Greek: ])?
You refer to exactly one space and ensuring that there is one space but that means surely back to square 1 and manually going through and altering or checking spaces on each row of text ???

goonhilly · April 16, 2019, 6:18am

Hi I got up early and “cleaned my glasses”- don’t know where it went wrong but I GOT IT TOO WORK and have to thank you guys for your patience! Anyway I would still like to know where I should start on learning the basics on making such entries as ([:script=Greek:] )([:script=Latin:]) -note I got the space after ].
As I got another issue but that one is for me to resolve as I can hear the sound of stampede- oh no not him again!
No seriously thanks for this support and I have now taken out annual subs to LO.

Lupp · April 16, 2019, 11:37am

I am not an expert in Regex. If I need to find something I’m not sure about, I mostly start with this link. It leads to a very “complete” guiide to Regex, and it sometimes isn’t easy to find the right page.
LibreOffice makes use of a free and open third-party Regex engine by ICU. @mikekaganski already pointed to one of their root page on Regex in his second comment on the original question, but it isn’t exactly a tutorial.
There are many tutorials about Regex in the web and you may find one suiting you better. Anyway you need to understand that there are different “dialects” and often more than one way to appeal to the same functionality. To restrict a part of a search expression to characters in Greek script, e.g, you may write [:script=Greek:]+ reminding you of the concept of character classes or (synonimous) \p{Greek} emphasizing the concept of unicode properties.

Lupp · April 16, 2019, 11:45am

A specific Regex engine may support both ways or just one of them (and probably even none of them). In the case under discussion ICU supports both approaches.
If you want to understand Regex in depth you should be ready to spend a substantial amount of time on learning and experimenting. Have a lot of fun!
(Historically RegEx is an invention by mathematicians / theoreticians on formal languages. It was extended, enhanced, and partly specialized aiming at the usage we have to cope with here.)

goonhilly · April 16, 2019, 12:18pm

Thanks for this very helpful link that I have had a look at. I realise that I cannot expect answers on plate and as I pointed/confirmed I managed to get it to work so am grateful for all that commented. I was possibly tired as when I retried early this morning it worked and I even managed to sort out the clump of text that my OCR scanner produces via my Iphone. In other words I can auto insert the pipe delimiter and then press enter on each greek new word and get it to a single column that is then easy with pipe line to sort.
I had previously spent time endeavouring to find app/ program/software that would recognise the greek letters and preserve the columns in my book of Greek course I am learning but the only 1 find was what I ended up with. I noted that someone commented weird OCR scan but that was what I had to work with.
It takes a photo of neatly arrange page where there are 4 columns of text i.e 2 groups of Gr-En Gr En and produces a lump of text that is OCR’d but needs arrange

Lupp · April 16, 2019, 12:29pm

You may send a typical one of your photographs to the email account you should find in my user info. I would probably play with it to find a good way in your sense. However, I cannot spend much time on it at the moment.

goonhilly · April 16, 2019, 12:31pm

What I have done is progressed with getting on with my project using what I have learnt thus far. I end up with the pipe line inserted with between greek and english pairs and I can plod through to then produce a singly column in ODTdoc to sort into 2 cell column 1st has greek and 2nd has english that is then imported into Anki flash cards.
I need to look at seeing if I can get around the manual part of pressing enter before each of the start of the greek words to get into a single column then when copied over to “ods” doc I can sort it with the data text to columns function in “calc”
περασμένος - η - ο | last άλλωστε | besides ανδρ κός - ή - o men 's αποφεύγω 1 avoid γνωστός - ή - o well known γυναιχείος - α - ο | women 's δύσκολος - η - ο | difficult , fussy τα ρούχα | clothes

So there must be away once I have got the first pipe insert and to avoid manually getting it initially into a singly column to automate the manual press “return key” before each greek word?

Lupp · April 16, 2019, 1:12pm

If you can get the lines from your textbook containing one or two Greek lemmas with their one or two English explanations, and import them into Calc there is a much simpler (more direct) way to get all that split correctly into columns with the help of the REGEX() function available in LibO V 6.2 0 and higher.
You find an example attached to my unswer under “EDIT”.

goonhilly · April 16, 2019, 2:27pm

Hi the irony was that just before I stopped for a little lunch I started to relook at that model that was posted on the German site (Trim etc) and the fact that it was the obvious yet missed by me italics part of the question and I pasted in some text both Greek/Latin and italicised the text and of course it worked. I then saw that you had kindly already gone down the route of assisting with a couple of alternatives of I will simply call it the trim model spreadsht.
You offered to allow me to send my issue of how it scans as when I paste in these boxes it formats and I cannot attach photo? Anyway your email is not there but mine is on my profile page so if you are able to send me an email assuming u can see mine then I will attach a photo as with your knowledge I am sure you are only a minute or two from solving my dilemma of spending too much time. Good thing about the profile is that you are x years and I am 68 now so I dont feel too humiliated by all this as you are expert.

goonhilly · April 16, 2019, 3:14pm

Hi note that you are a busy guy and I had a test with your “trim spread sheet model” that you kindly sent to me. Using your latest example I therefore pasted in to 1 of the cells in column a atypical extract of the text that is created. In other words a typical page from the Greek book in one of the chapters explaining the use of vocabulary will have on this varies up to 20-30 words. Therefore using the lower number that makes 20 pairs (a pair comprising the Greek word followed by the English translation).
My scanner puts all that into for example a simple notebook text file as a complete cluster of text. Of course it is very easy but very time-consuming to separate it into primitive columns in all sorts of document applications. I simply put the cursor at the start of the Greek word and pressed enter.
Using your model which I will label “trim” I therefore ended up with nearly 40 words in one of the typical cells (20 pairs) and I could not find a way of separating albeit thatyou have?

goonhilly · April 16, 2019, 3:17pm

Accordingly therefore if you do have time it would be very much appreciated. I believe that with your efforts we are literally on the cusp.
Using your model which I will label “trim” I therefore ended up with nearly 40 words in one of the typical cells (20 pairs) and I could not find a way of separating albeit that you have implied that it should be fairly easy to do that by going directly to spreadsheet rather than via the document file. In other words hopefully avoiding the necessity of putting in delimiters in one document.
I then decided to try and speed things up by utilising the fact that you have kindly put on a couple of pairs per row and therefore I was able to divided up into 10 rows fairly quickly. In column A therefore I ended up with 10 rows of 2 pairs and everything works okay but I’m still convinced as you are! That there must be an easier way and given the level of your expertise it must be fairly easy to do that.
In other words if you do have a few minutes to examine?

Lupp · April 16, 2019, 5:41pm

I am about to go out. Will respond when I find the time.

goonhilly · April 15, 2019, 10:13am

Sorry I was waffling away and did not get to the point but your suggestions and examples of the greek language find etc were excellent. As a surveyor now retired I am getting into bad habits, waffling away and I sometimes dont get to the issues in my style of writing and can appreciate the frustration.
One important issue is for me to make damn sure I make a “£” contribution as I have only just started using this suite given the fiasco with subscription only Microsoft products on a pension and reviewing your own website and that of LibreOffice the support and network is superb.
Many Regards Nige
PS I will get on to the Unicode or csv stuff a little later as I got enough trouble learning Greek at moment and coping with retirement when there are sometimes just too many things to address - its taken me a year or so to get used to being retired

erAck · April 15, 2019, 5:40pm

Please do not add comments as Answers if it is not an answer to the original question. Use add a comment instead. Or edit your question to modify. Thanks.

goonhilly · May 1, 2019, 6:57am

After all the help I received above I managed to sort out my “flashcard library development” in Anki and would like to thankeveryone.
Couple of issues I found this guy on youtube https://www.youtube.com/watch?v=7DG3kCDx53c who produces some REALLY EASY to understand REGEX videos that were invaluable. Diving in to the REXEGG.COM was a step too soon for me!
The other aspect is that there is an excellent inbuilt help menu on REGEX on this page
REGEX Function
Trouble is now I started this RegEx I have got a little distracted from my language learning to learning RegEx!

keme · July 5, 2021, 8:40am

Sorry I was waffling away and did not get to the point but your suggestions and examples of the greek language find etc were excellent. As a surveyor now retired I am getting into bad habits, waffling away and I sometimes dont get to the issues in my style of writing and can appreciate the frustration.

Have no fear of punctuation!

Even if you cannot write with perfect schoolbook punctuation (Secret revealed: nobody is able to do that), too much is usually better than not enough.

The major “punctuation” is the paragraph break. Start a new paragraph when you change subject (like from “The other aspect is …”). Note: Two newlines is required in the editor here to have it displayed with a paragraph break when you post it.

Commas section your clauses, so what belongs together is easier tied together when reading. Read it aloud. When you feel like taking a breath or otherwise pause, a comma may be in order.