sql - 重叠日期 - 开始日期应该是结束日期的第二天
问题描述
我需要一些帮助来修改我的查询。我观察到我的查询不能识别所有重叠的记录,例如在上面的记录列表中它可以识别重叠的日期,并且根据我的要求,RowEffectiveDate 应该是 RowEndDate 的第二天,但是对于这个特殊的记录来说有点困难。它只会从表中删除 2 条记录 8903 和 8904,并保留所有其他记录。任何帮助,将不胜感激。我正在使用的查询如下所述:
CREATE TABLE #dimP (
pcode char(4),
pkey int,
RowEffectiveDate datetime,
rowenddate datetime
);
INSERT INTO #dimP (pcode, pkey, RowEffectiveDate, rowenddate)
VALUES
('PL56',8855,'1900-01-01 00:00:00.000','2018-02-13 23:59:59.997'),
('PL56',8856,'2018-02-09 15:09:52.000','2018-02-14 23:59:59.997'),
('PL56',8903,'2018-02-09 15:09:52.000','2018-02-15 23:59:59.997'),
('PL56',8904,'2018-02-09 15:09:52.000','2018-02-16 23:59:59.997'),
('PL56',8935,'2018-02-14 00:00:00.000','2018-02-17 23:59:59.997'),
('PL56',8944,'2018-02-15 00:00:00.000','2018-02-18 23:59:59.997'),
('PL56',8955,'2018-02-16 00:00:00.000','2018-03-02 23:59:59.997'),
('PL56',8965,'2018-02-17 00:00:00.000','2018-03-03 23:59:59.997'),
('PL56',8972,'2018-02-18 00:00:00.000','2018-03-04 23:59:59.997'),
('PL56',8973,'2018-02-19 00:00:00.000','2018-03-05 23:59:59.997'),
('PL56',9114,'2018-03-03 00:00:00.000','2018-04-24 23:59:59.997'),
('PL56',9115,'2018-03-04 00:00:00.000','2018-04-25 23:59:59.997'),
('PL56',9136,'2018-03-05 00:00:00.000','2018-04-26 23:59:59.997'),
('PL56',9140,'2018-03-06 00:00:00.000','2018-04-29 23:59:59.997'),
('PL56',11597,'2018-04-25 00:00:00.000','2018-04-25 23:59:59.997'),
('PL56',11820,'2018-04-26 00:00:00.000','2018-04-26 23:59:59.997'),
('PL56',11837,'2018-04-27 00:00:00.000','2018-04-29 23:59:59.997'),
('PL56',11839,'2018-04-30 00:00:00.000','2018-05-22 23:59:59.997'),
('PL56',12372,'2018-05-23 00:00:00.000','9999-12-31 00:00:00.000');
SELECT *, ROW_NUMBER() OVER(PARTITION BY RowEffectiveDate ORDER BY rowenddate, PKey) AS rn
FROM #dimP;
WITH CTE AS (
SELECT pcode,pkey, RowEffectiveDate, rowenddate,
ROW_NUMBER() OVER(PARTITION BY RowEffectiveDate ORDER BY rowenddate, PKey) AS rn
FROM #dimP
WHERE pcode='KO18'
)
SELECT rn, pcode, pkey, RowEffectiveDate, rowenddate
FROM CTE
WHERE rn = 1
ORDER BY rn;
DROP TABLE #dimP;
电流输出:
pcode pkey RowEffectiveDate rowenddate
PL56 8855 1900-01-01 00:00:00.000 2018-02-13 23:59:59.997
PL56 8856 2018-02-09 15:09:52.000 2018-02-14 23:59:59.997
PL56 8903 2018-02-09 15:09:52.000 2018-02-15 23:59:59.997
PL56 8904 2018-02-09 15:09:52.000 2018-02-16 23:59:59.997
PL56 8935 2018-02-14 00:00:00.000 2018-02-17 23:59:59.997
PL56 8944 2018-02-15 00:00:00.000 2018-02-18 23:59:59.997
PL56 8955 2018-02-16 00:00:00.000 2018-03-02 23:59:59.997
PL56 8965 2018-02-17 00:00:00.000 2018-03-03 23:59:59.997
PL56 8972 2018-02-18 00:00:00.000 2018-03-04 23:59:59.997
PL56 8973 2018-02-19 00:00:00.000 2018-03-05 23:59:59.997
PL56 9114 2018-03-03 00:00:00.000 2018-04-24 23:59:59.997
PL56 9115 2018-03-04 00:00:00.000 2018-04-25 23:59:59.997
PL56 9136 2018-03-05 00:00:00.000 2018-04-26 23:59:59.997
PL56 9140 2018-03-06 00:00:00.000 2018-04-29 23:59:59.997
PL56 11597 2018-04-25 00:00:00.000 2018-04-25 23:59:59.997
PL56 11820 2018-04-26 00:00:00.000 2018-04-26 23:59:59.997
PL56 11837 2018-04-27 00:00:00.000 2018-04-29 23:59:59.997
PL56 11839 2018-04-30 00:00:00.000 2018-05-22 23:59:59.997
PL56 12372 2018-05-23 00:00:00.000 9999-12-31 00:00:00.000
预期输出应该是:
pcode pkey RowEffectiveDate rowenddate
PL56 8855 1900-01-01 00:00:00.000 2018-02-13 23:59:59.997
PL56 8935 2018-02-14 00:00:00.000 2018-02-17 23:59:59.997
PL56 8972 2018-02-18 00:00:00.000 2018-03-04 23:59:59.997
PL56 9136 2018-03-05 00:00:00.000 2018-04-26 23:59:59.997
PL56 11837 2018-04-27 00:00:00.000 2018-04-29 23:59:59.997
PL56 11839 2018-04-30 00:00:00.000 2018-05-22 23:59:59.997
PL56 12372 2018-05-23 00:00:00.000 9999-12-31 00:00:00.000
解决方案
这实际上比我想象的要容易。
但是,请注意,我不相信这是“正确”的解决方案(而不是压缩或组合日期范围),并且我偷偷怀疑所需的结果集实际上应该以原始形式提供。
事实证明,我们最需要的只是一个递归查询:
WITH R AS (SELECT pcode, pkey, rowEffectiveDate, rowEndDate
FROM dimP AS a
-- This makes it get the "first" row, with no other rows starting before it
WHERE NOT EXISTS (SELECT 1
FROM dimP AS b
WHERE b.pcode = a.pcode
AND b.rowEffectiveDate < a.rowEffectiveDate)
UNION ALL
SELECT curr.pcode, curr.pkey, curr.rowEffectiveDate, curr.rowEndDate
FROM dimP AS curr
JOIN R
ON R.pcode = curr.pcode
-- This date math is very fragile, and the data should be changed
-- so that then end-value is exclusive, and equal to the next start
AND DATEADD(ms, 3, R.rowEndDate) = curr.rowEffectiveDate)
SELECT *
FROM R
ORDER BY pcode, rowEffectiveDate
SQL Fiddle Example
(请注意,如果有重复的“第二”行,这几乎完全中断)
这产生了预期的:
pcode pkey rowEffectiveDate rowEndDate
----------------------------------------------------------------
PL56 8855 1900-01-01T00:00:00Z 2018-02-13T23:59:59.997Z
PL56 8935 2018-02-14T00:00:00Z 2018-02-17T23:59:59.997Z
PL56 8972 2018-02-18T00:00:00Z 2018-03-04T23:59:59.997Z
PL56 9136 2018-03-05T00:00:00Z 2018-04-26T23:59:59.997Z
PL56 11837 2018-04-27T00:00:00Z 2018-04-29T23:59:59.997Z
PL56 11839 2018-04-30T00:00:00Z 2018-05-22T23:59:59.997Z
PL56 12372 2018-05-23T00:00:00Z 9999-12-31T00:00:00.000Z
推荐阅读
- r - R中的双向重复测量方差分析
- r - 如何使用ggplot将轴元素放入框内
- python - 在python中获取spark提交作业的结果
- javascript - 使用正则表达式按键名提取 javascript 对象值
- sql - 创建一个索引,防止客户重复订单
- ios - 钥匙串项目更新时的NSInternalInconsistencyException
- powershell - PowerShell:对 CPU 使用率进行排序以显示占用 2% 及以上的进程
- python - Python Speech_Recognition - “这个音频源已经在上下文管理器中”
- java - 当类在类路径上时,Surefire 抛出 SurefireReflectionException
- python - 如何根据数据帧(pydub/python)中指定的起始/偏移量分割一系列录音