首页 > 解决方案 > 清除表中的重复行

问题描述

我有一个表,我想根据 5 个字段清除其中的重复行。这些字段是origin_id, destination_id, market_id, cabin, tripType。我想要做的是,删除所有具有相同 5 个字段的记录,除了一个最大的记录created_at。我希望每 5 个字段只有一条记录。清除重复项后,我将创建一个唯一索引,但由于重复项,现在我无法执行此操作。

到目前为止我所拥有的是这个查询,但它似乎没有工作:

DELETE FROM fares WHERE id NOT IN(
     SELECT f1.id FROM (SELECT * FROM fares) AS f1
     INNER JOIN (
          SELECT origin_id,destination_id,market_id,cabin,tripType,MAX(created_at) AS maxDate FROM fares
          GROUP BY origin_id,destination_id,market_id,cabin,tripType
     ) AS f2 ON f2.origin_id=f1.origin_id AND f2.destination_id=f1.destination_id AND 
     f2.market_id=f1.market_id AND f2.cabin=f1.cabin AND f2.tripType=f1.tripType
     WHERE f1.created_at=f2.maxDate
     GROUP BY f1.origin_id,f1.destination_id,f1.market_id,f1.cabin,f1.tripType
)

上面的查询只删除了 500 行,但我有 8k 重复。我使用以下查询捕获。

SELECT SUM(f.numberOfFares) AS duplicateFares FROM (
    SELECT origin_id,destination_id,market_id,cabin,tripType,COUNT(1) AS numberOfFares FROM fares
    GROUP BY origin_id,destination_id,market_id,cabin,tripType
    HAVING count(1)>1
) AS f

查询上面的结果这个

我希望每个origin_id, destination_id, market_id, cabin,tripType组有 1 条记录

我想问题是具有相同 created_at 值的记录

SQLFiddle

标签: mysql

解决方案


您正在寻找的模式称为“重复数据删除”。基本上,您通过将表连接到自身进行比较并删除您不想要的表:

delete F2
from Fares F1
join Fares F2 on F1.origin_id = F2.origin_id 
    and F1.destination_id = F2.destination_id 
    and F1.market_id = F2.market_id 
    and F1.cabin = F2.cabin 
    and F1.tripType = F2.tripType 
where F2.created_at = F1.created_at

在您要比较的列上设置索引将加快速度。


推荐阅读