Complex searches on a folder full of word docs

I’ve got a folder full of about 2000 MS Word (DOC and DOCX) files that I need to do complex searches in – for example, finding documents that contain “John” but not “John Doe”. Or “21,000” but not “21,000,000”.

First question – can LibreOffice do that?

If not, can I use LibreOffice to convert the word docs to rtfs? Then I could use EasyFind or BBEdit to do the searches. The modification dates of the files are important – they form a kind of diary – so I’m hoping to be able to preserve those dates in the rtfs. (The OSX Terminal command textutil will do the conversion but it creates files dated with the date of conversion.)

Any thoughts or suggestions would be greatly appreciated –

Steve

  1. LibreOffice cannot search in files in a directory, unless you create a macro which would open files one-by-one and do searches in them.
  2. LibreOffice can convert files to RTF.
  • But it saves the converted files with current datetime, not the one of original document.

See also Search for text in odt files in a folder

Thanks for the info. Any way to create a macro or script that would do the conversions and keep the dates?

S

Have a try with Glimpse (http://webglimpse.net) or SWISH-E (http://swish-e.org). Both are ad hoc tools to crawl through directories. They can’t directly read .doc(x) nor .odx files and need some technical preparation, but they do a pretty good job with extensive query expressions (including not and or and wildcards).

Glimpse will find all occurrences in the file, extensively listing all hit lines.

Swish-E will only list the name of files containing the queried string (or successful expression). This might be enough if you then systematically open the file.

In any case, you need a wrapper script around the tools. Meaning, you can’t just install the tools and get your queries working. You must first think over your problem and make “glue” scripts.

There was a public Glimpse example on SourceForge but the site is quite broken nowadays (you can still download/upload but there no longer access to the demos).

For SWISH-E, go to http://lxr.nginx.org/search (first display a file page such as http://lxr.nginx.org/source/LICENSE and then go to the aforementioned page to make a query based on words you read).