Removing duplicates from query without using HSQLDB?

mrmister · April 19, 2015, 8:21pm

Hi

Could we have a simple way to remove duplicates in our table without using HSQLDB, just in the Design View Mode? I was wondering about this, because in Access 2003 I remember we had a way to remove duplicates without having to type code.

I was wondering if we could remove duplicates in LibreOffice Base without having to type any code?

Cheers

doug · April 20, 2015, 2:42am

You would use the GROUP function in the query builder. If, for example, you had a query with three fields, Text1, Text2 and Text3, to ensure all Text1 data are unique, you would select GROUP in the function box for the column. Ideally, you also would pick a function for the other two fields to deal with the ambiguity of what what field to display for the values in those rows where Text1 is a duplicate value, primarily AVERAGE, MAX, and MIN.

If you want distinct combinations of Text1 and Text2, set them both to GROUP, etc.

If, on the other hand, this is a question about table design (I’m not completely certain), adding a PRIMARY KEY or aUNIQUE KEY will reject duplicate values on the key when a user tries to enter them. If the key is one column, that one column will be unique. In table design view, right mouse click on the column selector and pick the bottom option - PRIMARY KEY. Alternately, in Table Design view, go to Tools → Index Design Create a new index, e.g. index1, add the desired column, and click the box Unique, save.

image description

(If this answered your question, please accept the answer by clicking the check box () to the left)

mrmister · April 20, 2015, 10:46pm

Hi doug, first thing, let me thank you for your help with those replies so accurate and detailed.

However, I think I did not explain myself well…

Yes, the group function it’s magic, allow us to remove duplicates from our query, so we won’t see the duplicated results…
However, I don’t want that.

What I want is… having one table (TABLE 1), creating a new table (TABLE 2), clean, with all the data from TABLE1 but not getting the duplicates values. So assuming we have TABLE1 with duplicated values, what I want is to get all the data from TABLE1, and not getting the duplicates, and creating a new table, TABLE2, clean, with all the data from TABLE1 except the duplicates…

I think I did not explain clearly. I’m sorry…

So, can we do this without using HSQLDB? If we could do this without HSQLDB… it would be great. If not… could you give me an example code in HSQLDB creating a new table with all the duplicates removed?

I did a very simple database with two tables.

Fruits

Fruits and Recollectors

The first table have only one field, with some fruits duplicated.
The second table have fruit and its recollector name (two fields), some of the registries are duplicated and other don’t.
Of course we are not analyzing the primery key, because each registry would be unique, we should analyze duplicates in all the fields except the primary key.

I have uploaded the database, so you can download it from here:

https://mega.co.nz/#!0EgwWS4D!_bmEtEuVdbK59jBcYDgsNht_XWFTjdGLRiWhF3boGZI

Could you give me a method to create a new table without the values duplicated or if this is impossible, at least, a sample code in HSQLDB with the duplicateds removed in the new table?

Cheers

mariosv · April 21, 2015, 12:25am

Menu/Edit/Query properties.

image description

mrmister · April 21, 2015, 12:47am

Hi m.a.riosv, that is very cool… and a very easy way to create a query that filter the duplicates, but if you read my message I was searching for something different. I repeat my message:
What I want is… having one table (TABLE 1), creating a new table (TABLE 2), clean, with all the data from TABLE1 but not getting the duplicates values.

mrmister · April 21, 2015, 12:47am

So assuming we have TABLE1 with duplicated values, what I want is to get all the data from TABLE1, and not getting the duplicates, and creating a new table, TABLE2, clean, with all the data from TABLE1 except the duplicates…

mrmister · April 21, 2015, 12:48am

Summarizing I WANT TO CREATE A NEW TABLE, WITHOUT THE DUPLICATES… but getting the data from the original table, where we have duplicates.
How may I do that?

doug · April 21, 2015, 1:30am

You would use a CREATE TABLE statement, or copy the table structure, and then do an INSERT statement using as a record source a SELECT DISTINCT subquery like m.a.riosv suggests or a SELECT . . . GROUP BY query as discussed in the other answer. INSERT with a subquery recordset is what you want. Those queries only will work in Tools --. SQL interface.

mrmister · April 21, 2015, 9:42pm

Hi doug… I’m very lost… I’m not an expert in HSQLDB so I’m doing my first steps. It’s clear I need to copy the table structure, because I have Table1 and I want a copy the Table1 in Table2, so taking the example of the of the fruits database fruits.odb I have a table called FRUITS AND RECOLLECTORS, two fields, field1 the fruit’s name and field two the recollector’s name assigned to pick up that fruit. Easy. Okay, to avoid duplicates I do groups, and I get this code.

mrmister · April 21, 2015, 9:44pm

SELECT “Fruit”, “Recollector”

FROM “FRUITS AND RECOLLECTORS”

GROUP BY “Fruit”, “Recollector”

That is the HSQLDB code that avoide the duplicates, showing those records that are not duplicated. However, what I don’t know is how to move those registries, not duplicated to other table, for example: CLEAN_TABLE without the duplicates. You told me about copying the table structure, but I’m lost with that, don’t know what instructions I should use. Could you give me an example please? cheers

doug · April 21, 2015, 10:26pm

Right mouse click on the table in Base and tell it to Copy the table, then click a blank area in the window and select Paste. You need a blank version of the original table. Then, in Tools → SQL run: INSERT INTO "Table2" SELECT "Fruit", "Recollector" FROM "FRUITS AND RECOLLECTORS" GROUP BY "Fruit", "Recollector", assuming those two columns are the entire table. If those are not all of the columns, you will need to add the others, probably with appropriate MIN or MAX functions

doug · April 21, 2015, 10:29pm

Order of columns in the subquery matters, so keep the order of columns consistent. I have tested this syntax,works (with a different setup).

mrmister · April 23, 2015, 11:55am

Hi doug, well I’m a bit lost. You said assuming those two columns are the entire table. Well, we have three columns: ID (the primary key), Fruit, Recollector. Those are the fields. I did what you said, copied the structure of the table, empty (only the table definition), so Table2 have the fields ID (the primary key), Fruit, Recollector (but it is empty). Then typed the code you gave me and it returns the FOLLOWING ERROR:

mrmister · April 23, 2015, 11:56am

1: Column count does not match in statement [INSERT INTO “Table2” SELECT “Fruit”, “Recollector” FROM “FRUITS AND RECOLLECTORS” GROUP BY “Fruit”, “Recollector”]

mrmister · April 23, 2015, 12:10pm

Perhaps could you give me, please, a sample database showing this working properly, so I can see the code?

doug · April 23, 2015, 1:04pm

You will need to have a column in your query for each column in the table. Might be able to get away with fewer if the primary key is AUTONUMBER if not, add MIN("PrimaryKeyCol") to the query, changing the name of the field to match the actual name in the table. Thus, the query would become: INSERT INTO "Table2" SELECT MIN("PrimaryKeyCol"), "Fruit", "Recollector" FROM "FRUITS AND RECOLLECTORS" GROUP BY "Fruit", "Recollector" This assumes that the primary key is first in order.

PYS · April 21, 2015, 2:20pm

Hi - Another way to do:

Copy the table and its data (including duplicates)
Run a query to select the data to be deleted in the new table

SELECT “id”, “Fruit” FROM “Table2” WHERE “id” NOT IN ( SELECT MIN( “id” ) FROM “Table2” GROUP BY “Fruit” )

Explanations:

Two select are nested: the “first”, inside brackets, select only the minimum values of id
The second selects the records that are not in this list.

This query generates a set of editable records and you can therefore directly remove all the selected records.

RemoveDuplicates.odb

mrmister · April 23, 2015, 12:56pm

Hi Pierre, well the code you wrote show the duplicates… but removing the duplicates manually is not an option. If we have 500 records duplicated it would be a royal pain… no… what I need is the duplicated records be removed automatically when I run the query. So the code you wrote what is doing is showing the duplicates, but not removing the duplicates from the table or inserting in a new table those records that are not duplicated. What I need is that,a new table,clean,without duplicates

mrmister · April 23, 2015, 12:57pm

Any idea about how can we achieve this? an example would be highly appreciated

PYS · April 23, 2015, 2:19pm

Hi - removing 1 or 500 records is the same with my query : run the query, click the intersection of headers line & column (select all records), press the Delete key.

If you just want to execute one statement: Tools SQL and execute:

DELETE FROM "Table2" WHERE "id" NOT IN ( SELECT MIN( "id" ) FROM "Table2" GROUP BY "Fruit" )