sql-server - 在查询中分组相关记录
问题描述
我正在寻找一种对相关记录进行分组的方法。我正在使用的表包含唯一的身份对。相关记录是其中一个身份与另一记录中的相同身份相同的记录。
例如,如果一条记录的 ID1 和 ID2 分别为 384 和 768,那么另一条记录的 ID1 和 ID2 分别为 384 和 512,这将是相关的。此外,256 和 768 的另一条记录也将相关。但是,具有 1024 和 1536 的记录将不相关。
样本表和数据如下。
CREATE TABLE [dbo].[Identities](
[AN] [varchar](30) NOT NULL,
[ID1] [varchar](30) NOT NULL,
[ID2] [varchar](30) NULL,
[LastUpdateDate] [datetimeoffset](7) NOT NULL,
CONSTRAINT [PK_Identities] PRIMARY KEY CLUSTERED
(
[AN] ASC
) ON [PRIMARY],
CONSTRAINT [AK_ID] UNIQUE NONCLUSTERED
(
[ID1] ASC,
[ID2] ASC
) ON [PRIMARY]
) ON [PRIMARY]
GO
INSERT INTO [dbo].[Identities] ([AN], [ID1], [ID2], [LastUpdateDate])
VALUES
('00243', '6000086', '2200720', '2016-05-26 17:16:07.0000000 -04:00')
, ('00267', '6000752', '5700243', '2017-06-21 12:24:55.0000000 -04:00')
, ('00284', '6000086', '5000385', '2020-02-11 11:56:01.0000000 -05:00')
, ('00352', '6000752', '2200720', '2016-05-31 20:08:47.0000000 -04:00')
, ('00360', '6000752', '6000779', '2019-04-17 20:46:15.0000000 -04:00')
, ('00475', '6200598', '5000385', '2017-06-21 18:29:49.0000000 -04:00')
, ('00507', '6200599', '5000385', '2017-06-21 18:37:21.0000000 -04:00')
, ('00558', '6200606', '5000385', '2017-06-21 18:14:51.0000000 -04:00')
, ('00735', '5500045', '2200721', '2018-10-11 15:30:50.0000000 -04:00')
GO
从这个示例数据中,所需的输出如下。
AN ID1 ID2 LastUpdateDate Group
00243 6000086 2200720 2016-05-26 17:16:07.0000000 -04:00 1
00267 6000752 5700243 2017-06-21 12:24:55.0000000 -04:00 1
00284 6000086 5000385 2020-02-11 11:56:01.0000000 -05:00 1
00352 6000752 2200720 2016-05-31 20:08:47.0000000 -04:00 1
00360 6000752 6000779 2019-04-17 20:46:15.0000000 -04:00 1
00475 6200598 5000385 2017-06-21 18:29:49.0000000 -04:00 1
00507 6200599 5000385 2017-06-21 18:37:21.0000000 -04:00 1
00558 6200606 5000385 2017-06-21 18:14:51.0000000 -04:00 1
00735 5500045 2200721 2018-10-11 15:30:50.0000000 -04:00 2
我相信这可以通过递归查询来实现。可能类似于这里的解决方案:Recursively Querying Row Groups,但我的尝试没有成功。
解决方案
我认为您需要 2 个递归 cte,因为链接必须朝 2 个方向移动。像这样的东西:
WITH cte AS--recursive cte id2
(SELECT i1.ID1, i1.ID2
FROM [dbo].[Identities] i1
UNION ALL
SELECT c.ID1,i2.ID1
FROM [dbo].[Identities] i2
JOIN cte c ON c.ID2 = i2.ID2
) ,
cte2 AS--recursive cte id1
(SELECT i1.ID1, i1.ID2
FROM [dbo].[Identities] i1
UNION ALL
SELECT c.ID2, i2.ID2
FROM [dbo].[Identities] i2
JOIN cte c ON c.ID1 = i2.ID1
) ,
cte3 AS--union both recursive cte's
(SELECT *
FROM cte
UNION
SELECT *
FROM cte2
),
cte4 AS--group
(SELECT id1, min(id2) AS id2, dense_rank() OVER (ORDER BY min(id2)) AS grp
FROM cte3
GROUP BY id1)
SELECT i.AN,i.ID1,i.ID2,i.LastUpdateDate,c.grp
FROM cte4 c
JOIN [dbo].[Identities] i ON i.id1 = c.id1