Removing duplicates from query without using HSQLDB?

So assuming we have TABLE1 with duplicated values, what I want is to get all the data from TABLE1, and not getting the duplicates, and creating a new table, TABLE2, clean, with all the data from TABLE1 except the duplicates…

Summarizing I WANT TO CREATE A NEW TABLE, WITHOUT THE DUPLICATES… but getting the data from the original table, where we have duplicates.
How may I do that?

You would use a CREATE TABLE statement, or copy the table structure, and then do an INSERT statement using as a record source a SELECT DISTINCT subquery like m.a.riosv suggests or a SELECT . . . GROUP BY query as discussed in the other answer. INSERT with a subquery recordset is what you want. Those queries only will work in Tools --. SQL interface.

Hi doug… I’m very lost… I’m not an expert in HSQLDB so I’m doing my first steps. It’s clear I need to copy the table structure, because I have Table1 and I want a copy the Table1 in Table2, so taking the example of the of the fruits database fruits.odb I have a table called FRUITS AND RECOLLECTORS, two fields, field1 the fruit’s name and field two the recollector’s name assigned to pick up that fruit. Easy. Okay, to avoid duplicates I do groups, and I get this code.

SELECT “Fruit”, “Recollector”

FROM “FRUITS AND RECOLLECTORS”

GROUP BY “Fruit”, “Recollector”

That is the HSQLDB code that avoide the duplicates, showing those records that are not duplicated. However, what I don’t know is how to move those registries, not duplicated to other table, for example: CLEAN_TABLE without the duplicates. You told me about copying the table structure, but I’m lost with that, don’t know what instructions I should use. Could you give me an example please? cheers

Right mouse click on the table in Base and tell it to Copy the table, then click a blank area in the window and select Paste. You need a blank version of the original table. Then, in ToolsSQL run: INSERT INTO "Table2" SELECT "Fruit", "Recollector" FROM "FRUITS AND RECOLLECTORS" GROUP BY "Fruit", "Recollector", assuming those two columns are the entire table. If those are not all of the columns, you will need to add the others, probably with appropriate MIN or MAX functions

Order of columns in the subquery matters, so keep the order of columns consistent. I have tested this syntax,works (with a different setup).

Hi doug, well I’m a bit lost. You said assuming those two columns are the entire table. Well, we have three columns: ID (the primary key), Fruit, Recollector. Those are the fields. I did what you said, copied the structure of the table, empty (only the table definition), so Table2 have the fields ID (the primary key), Fruit, Recollector (but it is empty). Then typed the code you gave me and it returns the FOLLOWING ERROR:

1: Column count does not match in statement [INSERT INTO “Table2” SELECT “Fruit”, “Recollector” FROM “FRUITS AND RECOLLECTORS” GROUP BY “Fruit”, “Recollector”]

Perhaps could you give me, please, a sample database showing this working properly, so I can see the code?

You will need to have a column in your query for each column in the table. Might be able to get away with fewer if the primary key is AUTONUMBER if not, add MIN("PrimaryKeyCol") to the query, changing the name of the field to match the actual name in the table. Thus, the query would become: INSERT INTO "Table2" SELECT MIN("PrimaryKeyCol"), "Fruit", "Recollector" FROM "FRUITS AND RECOLLECTORS" GROUP BY "Fruit", "Recollector" This assumes that the primary key is first in order.

Hi - Another way to do:

  • Copy the table and its data (including duplicates)

  • Run a query to select the data to be deleted in the new table

    SELECT “id”, “Fruit” FROM “Table2” WHERE “id” NOT IN ( SELECT MIN( “id” ) FROM “Table2” GROUP BY “Fruit” )

Explanations:

  • Two select are nested: the “first”, inside brackets, select only the minimum values of id
  • The second selects the records that are not in this list.

This query generates a set of editable records and you can therefore directly remove all the selected records.

RemoveDuplicates.odb

Hi Pierre, well the code you wrote show the duplicates… but removing the duplicates manually is not an option. If we have 500 records duplicated it would be a royal pain… no… what I need is the duplicated records be removed automatically when I run the query. So the code you wrote what is doing is showing the duplicates, but not removing the duplicates from the table or inserting in a new table those records that are not duplicated. What I need is that,a new table,clean,without duplicates

Any idea about how can we achieve this? an example would be highly appreciated

Hi - removing 1 or 500 records is the same with my query : run the query, click the intersection of headers line & column (select all records), press the Delete key.

If you just want to execute one statement: Tools SQL and execute:

DELETE FROM "Table2" WHERE "id" NOT IN ( SELECT MIN( "id" ) FROM "Table2" GROUP BY "Fruit" )

pierre! amazing it deleted everything. Worked fine.
However, I have a doubt. Imagine I want to make the deletion in a saved query? what should I do?
I tried your code, adding the select statement.

SELECT “ID”, “Fruit”
DELETE FROM “Table2”
WHERE “ID” NOT IN ( SELECT MIN( “ID” ) FROM “Table2” GROUP BY “Fruit” )
It did not worked. Again, show me only the duplicated records, with an “Alias” that says DELETE.
What is wrong?

@mrmister - Without programming it is not possible. You can only run select queries.

Is that correct? I would need you please, clarify me this. Again, thanks a lot for the deletion code. :smiley:

@mrmister I do not understand “Is that correct”. Queries run from the interface have to produce a data set from a SELECT statement (the result may possibly not contain any data). Other requests must be launched from the Tools menu. This is the reason I first gave you the query to select that makes it easy to delete.