How would you compare these datasets using LO?

In another question, I was told that the task I have is not well suited to Calc, but needs a database.

Could I ask for people’s views on this problem:

I need to compare each of these datasets (three different lists of mosques in the UK):
Compare places & addresses problem.ods (36.5 KB)

The end result must be that the first two datasets include identical real-world items - that is, all the known mosques in the UK.

Some mosques appear in all datasets, some in only one; for some the address is known, for some not; and the name of a mosque can vary a lot between datasets.

  1. Is Base suitable for this task?
  2. If so, is it realistic for a newbie to attempt it using Base?
  3. Or should I stay with Calc?

I don’t have a lot of technical skill. I can learn query languages like overpass slowly, and I am getting better at more complicated Calc formulas.

Thanks.

Without having inspected your data, I’d start with a general remark: Base is the module of LibreOffice, wich connects to databases. It is not the database itself. You may compare it to a delivery service where you book an Uber-driver to bring you something from Walmart. Base is Uber, the driver obviously the database-driver and Walmart the database.
.
Ther critical part of your task is your incomplete data:

With complete addresses and names one can use a JOIN-command in SQL to find all unique places, but if this data is not available things become complicated quickly. And this is true for Base and Calc.
.
For a project like this I’d start with isolating zip-codes and towns. Then it needs to be checked, if you can check identity of two entries via address/name or if you need manual interaction.
.
IMHO database-JOINS are better if you need to match lists like 5 mosques in list1 in Southampton against 7 from list2. But, if you usually have only one entry per zip-code Calc can use VLOOKUP…

1 Like
1 Like

If he has power queries and Excel, this might be usable.
.
I also had the idea of a fuzzy compare, but the topic of full-text compare in databases is a bit more advanced.
.
I’d try some python libs(like fuzzywuzzy) as mentioned in the following link at stackoverflow. And as he deals with public addresses, he may even use the google-api for geo-locations). But also: Not a good project to start coding…

On full-text compare in databases here MariaDB:

1 Like

@eteb3 Try this:
Compare.odb (6.9 KB)
All tables are imported.
All tables are combined by a view, created in a query.
Value in tbl_filter.filter could be changed.
Finds all content in column name in all tables with qry_all_filter.

You could delete the content of any of the tables and add new content from openstreetmap in openstreetmap table and so on.

If you will get problems with speed of the database: Could also be migrated to a internal Firebird database, which runs much faster on my system with many rows in the tables.

1 Like

Thank you all for your comments.

I think given that

  • this is a pro bono project and
  • it’s an effort for my non-technical brain to understand the advice (I am getting most of it I hope…!)

my solution is to go and recruit a volunteer who is good with databases.

And I do think we’ll want a database eventually: the ultimate aim is to have a web-hosted directory of mosques, with all sorts of attributes (such as whether women are admitted, is there disabled access, etc). And for these to go into OpenStreetMap, too.

First thing I’ll show them is yours, @RobertG : very grateful for your efforts.

Hint: HSQL comes with SOUNDEX function built-in. Soundex - Wikipedia
It is designed to match similar English names in some fuzzy ways. In my opinion, it works with German names fairly well. It can be used to match similar names within the same postal code area.

2 Likes