首页 > 解决方案 > 删除 MySQL 中所有列匹配的重复行。保留最新行

问题描述

这是一个话题,在这里已经讨论过好几次了,但对我来说很难理解如何删除 MySQL 数据库中的重复行。是的,我见过很多例子,但它们被简化为 2 列等,解决方案大部分时间只是查询,没有解释。因为我想安全地清理我们的数据库。我已经备份了我的表,现在我不想删除重复的行(因为我有很多行)。下面是我尝试过的查询片段,它相当简单: 如果重复的行,选择最新的行并显示该行的 ID。

SELECT MAX(id) FROM Raw_Validated_backup GROUP BY col1, col2, col3, ... col_n-1 having COUNT(*) > 1;

现在,由于我选择了最新的重复行,我“只需”删除重复的行。如果你问我,更容易说完成。这是@Jose Rui Santos 编写的查询:

delete test
  from test
 inner join (
  select max(id) as lastId, email
    from test
   where email in (
              select email 
                from test
               group by email
              having count(*) > 1
       )
   group by email
) duplic on duplic.email = test.email
 where test.id < duplic.lastId;

他回复了这个帖子:MySQL删除重复记录但保持最新

问题是,我如何准确删除重复的行并保持最新?更喜欢采用伪格式,因此其他人也可以从中学习。例如:

DELETE table FROM database 

而不是具体的像:

DELETE email FROM city

MySQL 查询

示例 1:

DELETE t1 FROM Raw_Validated_backup AS t1 INNER JOIN Raw_Validated_backup AS t2 
      ON t1.time_start=t2.time_start 
      AND t1.time_end=t2.time_end 
      AND t1.first_temp_lpn=t2.first_temp_lpn 
      AND t1.first_WL=t2.first_WL 
      AND t1.first_temp_lpn_validated=t2.first_temp_lpn_validated 
      AND t1.second_temp_lpn=t2.second_temp_lpn 
      AND t1.second_WL=t2.second_WL 
      AND t1.second_temp_lpn_validated=t2.second_temp_lpn_validated 
      AND t1.third_temp_lpn=t2.third_temp_lpn 
      AND t1.third_WL=t2.third_WL 
      AND t1.third_temp_lpn_validated=t2.third_temp_lpn_validated 
      AND t1.first_temp_rising=t2.first_temp_rising 
      AND t1.first_WR=t2.first_WR 
      AND t1.first_temp_rising_validated=t2.first_temp_rising_validated 
      AND t1.second_temp_rising=t2.second_temp_rising 
      AND t1.second_WR=t2.second_WR 
      AND t1.second_temp_rising_validated=t2.second_temp_rising_validated 
      AND t1.third_temp_rising=t2.third_temp_rising 
      AND t1.third_WR=t2.third_WR 
      AND t1.third_temp_rising_validated=t2.third_temp_rising_validated 
      AND t1.id<t2.id;

示例 2:

DELETE FROM Raw_Validated_backup WHERE id NOT IN ( 
SELECT max(id) FROM Raw_Validated_backup 
GROUP BY time_start, time_end, first_temp_lpn, first_WL, first_temp_lpn_validated, second_temp_lpn, second_WL, second_temp_lpn_validated, third_temp_lpn, third_WL, third_temp_lpn_validated, first_temp_rising, first_WR, first_temp_rising_validated, second_temp_rising, second_WR, second_temp_rising_validated, third_temp_rising, third_WR, third_temp_rising_validated )

标签: mysql

解决方案


我会这样做:

mysql> select * from test;
+----+------------------+------+------+------+
| id | email            | col2 | col3 | col4 |
+----+------------------+------+------+------+
|  1 | user@example.org |    2 |    3 |    4 |
|  2 | user@example.org |    2 |    3 |    4 |
|  3 | user@example.org |    2 |    3 |    4 |
|  4 | user@example.org |    2 |    3 |    4 |
+----+------------------+------+------+------+

mysql> DELETE t1 FROM test AS t1 INNER JOIN test AS t2 
    ON t1.email=t2.email AND t1.col2=t2.col2 AND t1.col3=t2.col3 AND t1.col4=t2.col4 
    AND t1.id<t2.id;

mysql> select * from test;
+----+------------------+------+------+------+
| id | email            | col2 | col3 | col4 |
+----+------------------+------+------+------+
|  4 | user@example.org |    2 |    3 |    4 |
+----+------------------+------+------+------+

解释:join 中的条件查找t2id 较大且所有其他列相等的行。如果找到这样的行,那么t1一定不是重复项中id最大的行,因此t1应该删除行。


推荐阅读