Copy highlighted text to a new document with a macro

lduperval · October 29, 2021, 11:46pm

Hi,

Has anyone ever written a macro that goes through a document, takes all the text that is highlighted, and copies it to a new document?

By “highlighted” I mean using the highlight character format (as opposed to “selected” where I would simply cut and paste). In the analog world, it would be the equivalent using a highlighter marker and putting all those highlights in a single document.

If so, would you be willing to share?

I’m looking to take a 15 page document with a bunch of unconnected highlights and put them into a new document. One paragraph per highlight is fine… well, even desirable.

Thanks,

L

karolus · October 30, 2021, 4:08am

what exactly means "highlighted" here? is there a number of manually selected Textpieces … or is it highlighted by some kind of Formatting?

is it manually selected →→ copy and paste exists!

lduperval · October 30, 2021, 1:39pm

Hi,

I meant with formatting. I clarified.

Thanks,

L

GNK · October 30, 2021, 2:44pm

15 pages > (much)less than 1min to manually copy/paste selected text from each page into notepad = <15min
vs
create, test and implement macro = ???min …

is this really a useful function for you?

lduperval · October 30, 2021, 3:05pm

Yes or I wouldn’t ask. I have transcripts of audio that I highlight as I listen to the Mp3, then I take all those highlights and create a new document from them. Some transcripts can be 25 to 30 pages. And I get more than one per week.

I’m not asking anyone to do it. I’m asking if someone has done it. After doing the process multiple times, I think it would speed things up.

It’s a pretty niche request, but thought I’d check anyway before investing in a specialized tool… If that exists.

Thanks,

L

GNK · October 30, 2021, 5:45pm

All good then

I have not run into a macro like this (+ i don’t write them) - might be possible, but I guess that most callable objects in LO macros will reference the whole ‘text-frame’ or would require some definition in terms of a page-grid, or would address the text contents as a an SQL string (rather than with formatting).
If it was possible to use some kind of character marker (e.g. @#@ ) at beginning and end of the text chunks you wanted, then more standard sub_string functions would be able to be used to extract the wanted text.
Best Of Luck

Villeroy · October 30, 2021, 7:12pm

Edit>Find&replace (Ctrl+H)
Other Options
[Attributes]
[x] Character background
[Find All]
[Close]
Ctrl+C

GNK · October 30, 2021, 10:29pm

ah ha ! (never really noticed those options before)
<seems SOLVED (maybe?)>
…although breaks between chunks/line-breaks are not preserved if ‘all’ highlighted is copy/pasted - that might not be ideal given the use case here.
Insertion of a ‘special_characters’ to define beginning and end of text blocks would allow for some simple string coding to insert at least a line break (or some kind of white-space) between each block of highlighted/demarcated text …
… there is no ‘invert selection’(?) in writer so replacing the non-highlighted text with white space might not be an option…
…but maybe replace highlighted with a New Font - then use find>attribute to select all the remaining Original Font - then replace that with “tab” or “_______” …?

KamilLanda · October 31, 2021, 11:32am

You can test this
copyHighlighted.odt (20.4 kB)

But if you want to look for which text is formatted without knowing exactly how it differs from normal text, then you have to search the document character by character and test all possible properties for every character. And it will be very slow → maybe the faster way exists, but I don’t know it

GNK · October 31, 2021, 1:08pm

Hey - that is a pretty useful Macro !!

i looked up the .awt module:
https://www.openoffice.org/api/docs/common/ref/com/sun/star/awt/module-ix.html

Maybe rather than using " sun.star.awt.FontWieght " (ie bold) instead use " sun.star.awt.FontUnderline "
because the underline types include various not commonly used (eg DOUBLEWAVE) as well as NONE
https://www.openoffice.org/api/docs/common/ref/com/sun/star/awt/FontUnderline.html

one thing to improve perhaps? (in terms of the Translators work flow):
with the selections spread over many pages too much white-space might be present, which could just be manually deleted of course> but maybe there is way to reduce this by replacing any occurrence of e.g. >2 linebreaks with just 2 line breaks …

KamilLanda · October 31, 2021, 3:31pm

Inspired by the algorithms from the AltSearch

For more types of the Underline don’t set the .Value. It is in the example.

For Enter: it is possible to find one Enter with the regular expressions → find: $
But for more Enters it need to repeat the searching. Then you can test the next character if it is also Enter, or compare the end regions of the findings. There is the character test in the example.

moreEnters.odt (22.3 kB)

GNK · October 31, 2021, 3:54pm

@KamilLanda Great stuff - that seems to hit the nail on the head nicely!

Does that work for you @lduperval ?

Villeroy · October 31, 2021, 6:13pm

Side note: You can learn how to use the features of a complex application as a non-programmer or you can ask others to do your work programmatically. In the latter case you are on the road to immaturity.

lduperval · November 26, 2021, 8:46pm

Hi,

Thanks for this! It looks like it has the structure for what I want. I will make adjustments so it takes the highlights, or I will change my approach to use bold/underline to highlight when I need.

Thanks a bunch!

L