Ask Your Question
0

Complex searches on a folder full of word docs

asked 2018-08-15 16:39:07 +0100

Steve24 gravatar image

I've got a folder full of about 2000 MS Word (DOC and DOCX) files that I need to do complex searches in -- for example, finding documents that contain "John" but not "John Doe". Or "21,000" but not "21,000,000".

First question -- can LibreOffice do that?

If not, can I use LibreOffice to convert the word docs to rtfs? Then I could use EasyFind or BBEdit to do the searches. The modification dates of the files are important -- they form a kind of diary -- so I'm hoping to be able to preserve those dates in the rtfs. (The OSX Terminal command textutil will do the conversion but it creates files dated with the date of conversion.)

Any thoughts or suggestions would be greatly appreciated --

Steve

edit retag flag offensive close merge delete

3 Answers

Sort by » oldest newest most voted
0

answered 2018-08-15 17:46:10 +0100

Steve24 gravatar image

Thanks for the info. Any way to create a macro or script that would do the conversions and keep the dates?

S

edit flag offensive delete link more
0

answered 2018-08-15 16:42:55 +0100

  1. LibreOffice cannot search in files in a directory, unless you create a macro which would open files one-by-one and do searches in them.
  2. LibreOffice can convert files to RTF.
    • But it saves the converted files with current datetime, not the one of original document.
edit flag offensive delete link more

Comments

Mike Kaganski gravatar imageMike Kaganski ( 2018-08-15 16:59:45 +0100 )edit
0

answered 2018-08-15 19:49:23 +0100

ajlittoz gravatar image

Have a try with Glimpse (http://webglimpse.net) or SWISH-E (http://swish-e.org). Both are ad hoc tools to crawl through directories. They can't directly read .doc(x) nor .odx files and need some technical preparation, but they do a pretty good job with extensive query expressions (including not and or and wildcards).

Glimpse will find all occurrences in the file, extensively listing all hit lines.

Swish-E will only list the name of files containing the queried string (or successful expression). This might be enough if you then systematically open the file.

In any case, you need a wrapper script around the tools. Meaning, you can't just install the tools and get your queries working. You must first think over your problem and make "glue" scripts.

There was a public Glimpse example on SourceForge but the site is quite broken nowadays (you can still download/upload but there no longer access to the demos).

For SWISH-E, go to http://lxr.nginx.org/search (first display a file page such as http://lxr.nginx.org/source/LICENSE and then go to the aforementioned page to make a query based on words you read).

edit flag offensive delete link more
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2018-08-15 16:39:07 +0100

Seen: 66 times

Last updated: Aug 15 '18