Macro: Split at line break produces a strange string

If you create the following macro and select two line of text and run it, you will get two message boxes even though there is only one print command. The first message box is blank and the second one actually shows your second line of text. What is going on?

sub Main
	viewCursor = ThisComponent.CurrentController.getViewCursor
	myString = viewCursor.String
	myLines = Split(myString, chr(13))
	print myLines(1)
end sub

If you print myLines(0), it prints the first line just fine. The fact that the second line is wonky is screwing up code following the split procedure in a macro I am trying to make. I thought that there is a line break character stuck at the beginning of the second array element, so I tried searching and replacing it, but that didn’t help.

Just try insert line

 Print Asc(Left(myLines(1),1))

befor your print and get message

Char LineFeed .png

It is code of char “Line Feed”, U+000A
So, try change your split-operator as

myLines = Split(Join(Split(myString, chr(10)), chr(13))

It will remove LF (if it present in the text) and split line by CR

Actually, your suggested solution creates a large array with every word as a separate element, and the words from the second line are listed first somehow!

In fact, I did not offer a solution, I showed how to change your test case. :slight_smile: The first part of my answer is to find the reason for the appearance of an empty message (a window that displays only LF). The second part helps to correctly execute your operator print myLines(1). A large array of words you get if you use the Split function without the second parameter (by default it is a space)

@lomacar:
What you are talking of as two lines is actually two paragraphs. The distinction between lines and paragraphs and the representastion of “new paragraph” in code was a mess in text processing by software from the beginning. (It’s still a mess, also in the heads of users.)

There never was a generally accepted single ASCII code for “new paragraph”. Some systems didn’t distinguish it from “new line” at all, others coded it by different combinations of (decimal) ASCII 10 (LF) and ASCII 13 (CR).
That was the time of strictly character oriented output devices where “carriage return” and “line feed” were very different things and the first option allone lead to overwriting the same line. Also VT existed! And HT is still in use.

In short: The way “new paragraph” is currently represented in the string property of a TextCursor (or ViewCursor) in living LibO documents is Chr(13) & Chr(10), at least on Win systems. (Hopefully on other systems, too, though the may use the reverse combination or Chr(13) allone on the OS level.) There is NO guarantee of any kind that this -however it is- will not be changed without notice. If you insist on splitting paragraphs in Basic based on the strings returned by cursors, don’t forget to remove trailing or prefixd Chr(10) from the array elements you get.

You may try myLines = Split(myString, Chr(13) & Chr(10)).

(Edit 1 with respect to the comment below:)
TextCursor objects can enumerate the paragraphs they intersect with as objects. See example code:

Sub enumerateParagraphsForViewCursor()
theVC     = ThisComponent.CurrentController.ViewCursor
theTC     = theVC.Text.CreateTextCursorByRange(theVC)
theTCEnum = theTC.CreateEnumeration
Do While theTCEnum.HasMoreElements()
 theEl     = theTCEnum.NextElement
 MsgBox(theEl.String & " : " & theEl.ParaStyleName)
Loop
End Sub

Is there a better way to split it than “based on the strings returned by cursors”?

Anyway you need to assure that the ViewCursor shows a single contiguous selection without frames or alike…
I would suppose the above Basic statement works reliably then for the strings as long as you do not worry about styles and attributes, but I am not completely sure concerning different platforms.
I am sure, however, that you cannot get the paragraphs as ojects together with their formatting the way you try.