How do I detect end of line in Writer using BASIC?

I’m writing a macro to parse a document in Writer and harvest all words with # in front of them (DIY hashtag keywords). I just look for the # sign and then look for the trailing space. Everything works great except for the last hashtag in a line. For example:

Hashtags: #diy #hash #tag #here

Blah, blah, blah.

parses as:

diy
hash
tag
hereBlah

I cannot figure out how to identify the CRLF/paragraph break after that last hashtag. I even tried nibbling one character at a time using CHR() to examine the ASCII code of each character, but the CRLF/paragraph break is invisible.

Got any ideas on how to detect the CRLF/paragraph break/end of line?

Thanks,

Todd

A paragraph break is not a character in the common sense. In Edit>Find & Replace, it can be “accessed” with regular expressions as $, i.e. it is a location not plain ol’text. If you have a look at the ODF XML, you’ll see that a paragraph is a <text:p …>content</text:p> element. Here again, the paragraph break is not a common character.

I don’t know what you want to do with your hastags, but if you don’t reinject them into the document, why wouldn’t you export it as .txt and scan it with macros in shell, Perl or other? All this can be stuffed in a single piped command (and so be as handy as starting a macro from within Writer).

Good point. Thanks! That’s what I should do.

Todd

Try:

Sub lookinside()
CR = Chr(13) : LF = Chr(10)
REM If a SINGLE TextRange is selected, and this range includes a paragraph break,
REM this break is actually represented bc CR LF. 
REM A SearchDescriptor or F&R never include paragraph breaks inside a finding.
tRg = ThisComponent.CurrentSelection(0)
pos = InStr(tRg.String, CR & LF)
Print Pos
End Sub

I tried CHR(13), but no dice. Thanks, though.

How does your code look? What object do you “parse”? What means do you use for “parsing”?
Use a SearchDescriptor with .SerachRegularExpression = True and .SearchExpression = "#\w+" like in:

Sub collectHashtagKeywords()
doc = ThisComponent
searcher = doc.createSearchDescriptor()
With searcher
  .SearchRegularExpression = True
  .SearchString = "#\w+"
End With
findings = doc.findAll(searcher)
u = findings.Count - 1
Dim foundKW(u) As String
For j = 0 To u
  foundKW(j) = findings(j).String
Next j
collectedKeys = Join(foundKW, " ")
tx = doc.Text
tc = tx.createTextCursorByRange(tx.End)
tx.insertControlCharacter(tc, 0, False)
tx.insertString(tc, "Collected Keys: ", False)
tx.insertControlCharacter(tc, 0, False)
tx.insertString(tc, collectedKeys, False)
End Sub  

See example: ask282556CollectFindingsAtTextEnd_1.odt

Thank you for this. I tried using these objects but couldn’t find adequate documentation, so I gave up and just parsed 100 characters at a time from ThisComponent.Text.String using string functions. Do you know where to find documentation on ThisComponent.createSearchDescriptor and ThisComponent.createTextCursorByRange (or simply ThisComponent, for that matter)?

And your code NAILED it! Copy, paste, F5, and boom. Thanks again!

General searchable API reference is LibreOffice: Main Page.
What’s missing in many cases is a safe and simple way to find answers on questions of the kind “What objects (services, interfaces) give access to (can use) this method (interface, service …)?” Not to speak of questions like “What means are suitable for…?” (The typical forum question)
In addition you may find objects claiming to support a general Service (like “com.sun.star.text.Text”, but doing it with restrictions depending on the context in a way not easily understandabel for somebody like myself not well educated concerning the used terminology and concepts.
Despite the “Basic” in the titles, the famous texts by Andrew Pitonyak give probably the best introduction not only to the LibreOffice Basic “dialect”, but also to API usage on a practical level:
See the long table here: www.pitonyak.org/oo.php.
The SearchDescriptor service isn’t specifically complicated.

Thank you so much!

And your code NAILED it! …

Please note that a relevant detail of the solution is the RegularExpression used as .SearchString.
If you want to understand for what reason "(?<=^| )#\w+" might be a better solution, you will find little in Andrew’s texts an nothing in the API reference.
Regular Expression Tutorial - Learn How to Use Regular Expressions would be a good but challenging source for this part. (LibreOffice is using the RegEx engine by ICU.)