Help with construction of database from file names

Hi all,

I am a Musical Director working on a cruise ship and I have a Windows 7 folder with in excess of 19,000 files, which are all music arrangements in PDF format (an arrangment is a collection of parts for different instruments, eg Piano and Bass and Drums and Trumpet and Sax and Trombone. The last three are collectively known as ‘horns’ (short for ‘horn section’) The files are all named according to the following standard:

Name (composer) - description - vocal - Arranger.pdf

the description field is used to identify the numer of parts to the arrangement, eg “13 Horns”, “Piano Only”, “Rhythm section only”, etc.

The ‘vocal’ field signifies whether this is a vocal arrangement; if it’s an instrumental arrangement then the field is blank and the “-” separators are not included.

I want to be able to search all this data and I think I need to create a searchable database to do it. I’m aware that Windows Search will look at all the files and search for whatever my search term is, but I want to go further and have 2 or 3 search parameters, possibly searching on every field. I also want to include an element of fuzzy logic.

Can anyone tell me how to go about creating a database where each record is generated from these filenames?

And what clever things can I do with the database once created?

Thanks for your help.

Hmmm, parsing 19,000 file names into a hsqldb embedded database as provided by the native ODB container in LibreOffice will represent something of a challenge in terms of performance and stability. However, it is in theory doable via several routes :

  • use a windows scripting host script to parse the directory, split the file names into respective fields, save as CSV (not being an expert in WSH, I wouldn’t know how to do that), and then import into a hsqldb ODB - this would be statically linked, i.e. you wouldn’t be able to update the data held in the underlying CSV file;

  • write a LO Basic macro that would parse the file names, create an array, set up a connection to a new or existing ODB db file, and then load the contents of the array into the database - doable, but probably fairly poor performance, since everything would be loaded into memory of the LO process, including all the JVM required to handled the database bit ;

  • write some other script (python/php/perl, etc), to parse the list of files, and write it to an output format that can be imported by LO Base, or preferably, to a db server backend table - the advantage of this route is that the server would do all the heavy db lifting work, and you could just use an ODB file as a frontend for forms and reports.

If you trawl on the internet, there a quite a few scripts out there that deal with parsing file names and chopping them up into something usable. After that, it is up to you how you want to import them (or not) into Base.

Oh, forgot to mention that Base doesn’t provide for “fuzzy logic” searches, it uses regular SQL expressions, you might want to search for a db engine that provides that kind of functionality, not even MS SQL Server had it (but it might have been implemented since then) :

http://consultingblogs.emc.com/jamiethomson/archive/2008/05/21/fuzzyness.aspx

Note that even with other DB engines, this is not always that easy :

http://www.yvoschaap.com/weblog/easy_fuzzy_logic_with_mysql_the_end_of_no_results_found

Maybe programs like CubicExplorer (http://portableapps.com/apps/utilities/cubicexplorer_portable) could be an easier solution.