首页 > 解决方案 > 如何从 GROUP BY 查询中删除特定范围的记录?

问题描述

我有一个带有 column 的表MediaID。可以(并且通常有)多个记录具有相同的MediaID. 还有一个AddDate专栏。我如何编写一个查询,将相同的记录分组MediaID在一起,并且对于每个有 10 条以上记录的组,对它们进行排序AddDate并删除除两个最旧和两个最新记录之外的所有记录?

我已经编写了一个查询,该查询将记录分组MediaID,然后使用HAVING COUNT(*) > 10它将给我MediaID要删除的 s,然后将其用作子查询来JOIN支持原始表,这样我就只能看到具有 of 的记录这些MediaIDs,但我一直坚持如何过滤出两个最新和两个最旧的记录MediaID

CREATE TABLE [dbo].[MediaTest_V]
(
    [ID] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY,
    [MediaID] [int] NOT NULL,
    [AddDate] [DateTime] NOT NULL
) 

INSERT INTO MediaTest_V (MediaID, AddDate)
VALUES (7, '2015-01-05T09:00:00')
INSERT INTO MediaTest_V (MediaID, AddDate)
VALUES (8, '2015-01-05T10:00:00')
INSERT INTO MediaTest_V (MediaID, AddDate)
VALUES (8, '2015-02-05T10:00:00')
INSERT INTO MediaTest_V (MediaID, AddDate)
VALUES (8, '2015-03-05T10:00:00')
INSERT INTO MediaTest_V (MediaID, AddDate)
VALUES (8, '2015-04-05T10:00:00')
INSERT INTO MediaTest_V (MediaID, AddDate)
VALUES (8, '2015-05-05T10:00:00')
INSERT INTO MediaTest_V (MediaID, AddDate)
VALUES (8, '2015-06-05T10:00:00')
INSERT INTO MediaTest_V (MediaID, AddDate)
VALUES (8, '2015-07-05T10:00:00')
INSERT INTO MediaTest_V (MediaID, AddDate)
VALUES (8, '2015-08-05T10:00:00')
INSERT INTO MediaTest_V (MediaID, AddDate)
VALUES (8, '2015-09-05T10:00:00')
INSERT INTO MediaTest_V (MediaID, AddDate)
VALUES (8, '2015-10-05T10:00:00')
INSERT INTO MediaTest_V (MediaID, AddDate)
VALUES (8, '2015-11-05T10:00:00')

SELECT *
FROM MediaTest_V mv
INNER JOIN (
    SELECT MediaID
    FROM MediaTest_V
    GROUP BY MediaID
    HAVING COUNT(*) > 10
) vc ON (vc.MediaID = mv.MediaID)

+----+---------+-------------------------+
| ID | MediaID |         AddDate         |
+----+---------+-------------------------+
|  2 |       8 | 2015-01-05 10:00:00.000 |
|  3 |       8 | 2015-02-05 10:00:00.000 |
|  4 |       8 | 2015-03-05 10:00:00.000 |
|  5 |       8 | 2015-04-05 10:00:00.000 |
|  6 |       8 | 2015-05-05 10:00:00.000 |
|  7 |       8 | 2015-06-05 10:00:00.000 |
|  8 |       8 | 2015-07-05 10:00:00.000 |
|  9 |       8 | 2015-08-05 10:00:00.000 |
| 10 |       8 | 2015-09-05 10:00:00.000 |
| 11 |       8 | 2015-10-05 10:00:00.000 |
| 12 |       8 | 2015-11-05 10:00:00.000 |
+----+---------+-------------------------+

显示一组 12 条记录的最小示例,其中 11 条具有相同的MediaID. 在这种情况下,我希望查询删除 ID 2 到 10(含)的记录,因为超过 10 条记录的 ID 为MediaID1 of 8, and records with1、2、10 和 11 要么是最近的两个记录,要么是两个最旧的记录。实际上,有太多记录无法手动执行此操作。

我只能将 MediaTest_V 表过滤到只有那些具有将要删除的 MediaID 的表,但我不确定如何过滤掉最旧和最近的记录。

标签: sql-server

解决方案


这是我想出的:

查询检查要删除的结果:(col“old”是识别两个最旧的记录,“new”是两个最新的记录,仅用于那些计数> 10的MediaID)

select * from (

select 
    ROW_NUMBER() over (partition by v.mediaid order by v.adddate asc) old, 
    ROW_NUMBER() over (partition by v.mediaid order by v.adddate desc) recent, 
    v.*
from MediaTest_V v
inner join (
    SELECT MediaID
    FROM MediaTest_V
    GROUP BY MediaID
    HAVING COUNT(*) > 10
) vc ON vc.MediaID = v.MediaID

) seq
where
seq.old > 2 and seq.recent > 2 

查询执行删除:

delete del
from MediaTest_V del
inner join (

select ID from (

    select 
        ROW_NUMBER() over (partition by v.mediaid order by v.adddate asc) old,
        ROW_NUMBER() over (partition by v.mediaid order by v.adddate desc) recent, 
        v.*
    from MediaTest_V v
    inner join (
        SELECT MediaID
        FROM MediaTest_V
        GROUP BY MediaID
        HAVING COUNT(*) > 10
    ) vc ON vc.MediaID = v.MediaID

) seq
where
    seq.old > 2 and seq.recent > 2

) seq on del.ID = seq.ID

推荐阅读