Removing duplicates from query without using HSQLDB?

mrmister · April 21, 2015, 12:47am

So assuming we have TABLE1 with duplicated values, what I want is to get all the data from TABLE1, and not getting the duplicates, and creating a new table, TABLE2, clean, with all the data from TABLE1 except the duplicates…

mrmister · April 21, 2015, 12:48am

Summarizing I WANT TO CREATE A NEW TABLE, WITHOUT THE DUPLICATES… but getting the data from the original table, where we have duplicates.
How may I do that?

doug · April 21, 2015, 1:30am

You would use a CREATE TABLE statement, or copy the table structure, and then do an INSERT statement using as a record source a SELECT DISTINCT subquery like m.a.riosv suggests or a SELECT . . . GROUP BY query as discussed in the other answer. INSERT with a subquery recordset is what you want. Those queries only will work in Tools --. SQL interface.

mrmister · April 21, 2015, 9:42pm

Hi doug… I’m very lost… I’m not an expert in HSQLDB so I’m doing my first steps. It’s clear I need to copy the table structure, because I have Table1 and I want a copy the Table1 in Table2, so taking the example of the of the fruits database fruits.odb I have a table called FRUITS AND RECOLLECTORS, two fields, field1 the fruit’s name and field two the recollector’s name assigned to pick up that fruit. Easy. Okay, to avoid duplicates I do groups, and I get this code.

mrmister · April 21, 2015, 9:44pm

SELECT “Fruit”, “Recollector”

FROM “FRUITS AND RECOLLECTORS”

GROUP BY “Fruit”, “Recollector”

That is the HSQLDB code that avoide the duplicates, showing those records that are not duplicated. However, what I don’t know is how to move those registries, not duplicated to other table, for example: CLEAN_TABLE without the duplicates. You told me about copying the table structure, but I’m lost with that, don’t know what instructions I should use. Could you give me an example please? cheers

doug · April 21, 2015, 10:26pm

Right mouse click on the table in Base and tell it to Copy the table, then click a blank area in the window and select Paste. You need a blank version of the original table. Then, in Tools → SQL run: INSERT INTO "Table2" SELECT "Fruit", "Recollector" FROM "FRUITS AND RECOLLECTORS" GROUP BY "Fruit", "Recollector", assuming those two columns are the entire table. If those are not all of the columns, you will need to add the others, probably with appropriate MIN or MAX functions

doug · April 21, 2015, 10:29pm

Order of columns in the subquery matters, so keep the order of columns consistent. I have tested this syntax,works (with a different setup).

mrmister · April 23, 2015, 11:55am

Hi doug, well I’m a bit lost. You said assuming those two columns are the entire table. Well, we have three columns: ID (the primary key), Fruit, Recollector. Those are the fields. I did what you said, copied the structure of the table, empty (only the table definition), so Table2 have the fields ID (the primary key), Fruit, Recollector (but it is empty). Then typed the code you gave me and it returns the FOLLOWING ERROR:

mrmister · April 23, 2015, 11:56am

1: Column count does not match in statement [INSERT INTO “Table2” SELECT “Fruit”, “Recollector” FROM “FRUITS AND RECOLLECTORS” GROUP BY “Fruit”, “Recollector”]

mrmister · April 23, 2015, 12:10pm

Perhaps could you give me, please, a sample database showing this working properly, so I can see the code?

doug · April 23, 2015, 1:04pm

You will need to have a column in your query for each column in the table. Might be able to get away with fewer if the primary key is AUTONUMBER if not, add MIN("PrimaryKeyCol") to the query, changing the name of the field to match the actual name in the table. Thus, the query would become: INSERT INTO "Table2" SELECT MIN("PrimaryKeyCol"), "Fruit", "Recollector" FROM "FRUITS AND RECOLLECTORS" GROUP BY "Fruit", "Recollector" This assumes that the primary key is first in order.

PYS · April 21, 2015, 2:20pm

Hi - Another way to do:

Copy the table and its data (including duplicates)
Run a query to select the data to be deleted in the new table

SELECT “id”, “Fruit” FROM “Table2” WHERE “id” NOT IN ( SELECT MIN( “id” ) FROM “Table2” GROUP BY “Fruit” )

Explanations:

Two select are nested: the “first”, inside brackets, select only the minimum values of id
The second selects the records that are not in this list.

This query generates a set of editable records and you can therefore directly remove all the selected records.

RemoveDuplicates.odb

mrmister · April 23, 2015, 12:56pm

Hi Pierre, well the code you wrote show the duplicates… but removing the duplicates manually is not an option. If we have 500 records duplicated it would be a royal pain… no… what I need is the duplicated records be removed automatically when I run the query. So the code you wrote what is doing is showing the duplicates, but not removing the duplicates from the table or inserting in a new table those records that are not duplicated. What I need is that,a new table,clean,without duplicates

mrmister · April 23, 2015, 12:57pm

Any idea about how can we achieve this? an example would be highly appreciated

PYS · April 23, 2015, 2:19pm

Hi - removing 1 or 500 records is the same with my query : run the query, click the intersection of headers line & column (select all records), press the Delete key.

If you just want to execute one statement: Tools SQL and execute:

DELETE FROM "Table2" WHERE "id" NOT IN ( SELECT MIN( "id" ) FROM "Table2" GROUP BY "Fruit" )

mrmister · April 23, 2015, 10:54pm

pierre! amazing it deleted everything. Worked fine.
However, I have a doubt. Imagine I want to make the deletion in a saved query? what should I do?
I tried your code, adding the select statement.

SELECT “ID”, “Fruit”
DELETE FROM “Table2”
WHERE “ID” NOT IN ( SELECT MIN( “ID” ) FROM “Table2” GROUP BY “Fruit” )
It did not worked. Again, show me only the duplicated records, with an “Alias” that says DELETE.
What is wrong?

PYS · April 24, 2015, 11:07am

@mrmister - Without programming it is not possible. You can only run select queries.

mrmister · April 24, 2015, 11:21am

Is that correct? I would need you please, clarify me this. Again, thanks a lot for the deletion code.

PYS · April 25, 2015, 10:05am

@mrmister I do not understand “Is that correct”. Queries run from the interface have to produce a data set from a SELECT statement (the result may possibly not contain any data). Other requests must be launched from the Tools menu. This is the reason I first gave you the query to select that makes it easy to delete.

AlexKemp · August 20, 2020, 2:30pm