There are several ways to do what you want.
I. Using JOIN
:
This technique consists of joining the table to itself by JOIN
. And use a combined comparison of the column id
(which is assumed never to be repeated) and the repeated column ( email
in this case).
The query would be this:
DELETE t1 FROM usuarios t1
INNER JOIN usuarios t2
WHERE t1.id > t2.id AND t1.email = t2.email;
Let's see a complete example:
VIEW DEMO IN REXTESTER
-- I. Datos de prueba
CREATE TABLE usuarios (
id INT PRIMARY KEY AUTO_INCREMENT,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
email VARCHAR(255) NOT NULL
);
INSERT INTO usuarios (first_name,last_name,email)
VALUES ('Pedro ','Pérez','[email protected]'),
('Juan','Roa','[email protected]'),
('Santiago','Guerrero','[email protected]'),
('Juan','Roa','[email protected]'),
('Andrés ','Brito','[email protected]'),
('Juan','Roa','[email protected]'),
('Santiago','Guerrero','[email protected]'),
('Felipe','Castro','[email protected]')
;
/* Prueba de datos antes de borrar*/
SELECT
id, email
FROM usuarios
ORDER BY email;
-- II. Borrar mediante JOIN
DELETE t1 FROM usuarios t1
INNER JOIN usuarios t2
WHERE t1.id > t2.id AND t1.email = t2.email;
/* Prueba de datos después de borrar*/
SELECT
id, email
FROM usuarios
ORDER BY id;
-- III. Restringir duplicados
ALTER TABLE usuarios ADD UNIQUE (email);
/*
Prueba de la nueva restricción creada:
al descomentar el código tendremos el siguiente error:
Error(s), warning(s): Duplicate entry '[email protected]' for key 'email'
*/
-- INSERT INTO usuarios (first_name,last_name,email) VALUES ('Pedro ','Pérez','[email protected]');
Results:
Before deleting duplicates:
id email
5 [email protected]
8 [email protected]
3 [email protected]
7 [email protected]
1 [email protected]
2 [email protected]
4 [email protected]
6 [email protected]
After deleting duplicates:
id email
1 [email protected]
2 [email protected]
3 [email protected]
5 [email protected]
8 [email protected]
Note that the values whose id
was lower were kept here. If you want to keep those whose id
is greater, just put the WHERE
of the query as: ... WHERE t1.id < t2.id AND t1.email = t2.email
Note that by applying this technique, there will be breaks in the id
column. If you want a table with a better order in that column, see the method mentioned below.
II. By creating a new table
This technique consists of creating a new table with the same structure as the original table. Insert the records from the table with duplicates by applying a filter that avoids duplicates ( GROUP BY
in this case) and then delete the old table, renaming the new one.
1. We create an empty copy of the table usuarios
CREATE TABLE usuarios_copy LIKE usuarios;
2. Insert all records in the new table, grouping by the repeated column
There are two ways to do this:
-
If you want to have a new table with the column id
ordered, you can explicitly name each column, indicating the value NULL
for the incremental auto column. This will allow that column to obtain its value in a natural way:
INSERT INTO usuarios_copy
SELECT
NULL,
first_name,
last_name,
email
FROM usuarios
GROUP BY email;
-
If you do not mind having so many jumps in the id
column, you can do the insertion like this:
INSERT INTO usuarios_copy
SELECT *
FROM usuarios
GROUP BY email;
3. We delete the old table and rename the new one
DROP TABLE usuarios;
ALTER TABLE usuarios_copy RENAME TO usuarios;
In the data test, you will get this result. Note that here the column id
regained its normal order:
id email
1 [email protected]
2 [email protected]
3 [email protected]
4 [email protected]
5 [email protected]
Let's see a complete example applying this possibility:
SEE DEMO IN REXTESTER
III. Solve the root problem
It will be of little use to do all the previous work, if we do not change the design of the table, making the column email
is UNIQUE
, that is, that does not allow duplicate emails.
This would be achieved with a single line of code, which must be executed once our table is clean of duplicates:
ALTER TABLE usuarios ADD UNIQUE (email);
If now we try to insert a duplicate:
INSERT INTO usuarios (first_name,last_name,email)
VALUES ('Pedro ','Pérez','[email protected]');
The DBMS will stop us, saying the following:
Duplicate entry '[email protected]' for key 'email'
NOTE: Maybe in a table like this, there is a group of columns that are UNIQUE
, in that case, you would have to create a single combined index ... But that's another matter . I just wanted to indicate that this type of problem must be solved at the root.