首页 > 解决方案 > To remove duplication of data if within 7 days

问题描述

Following is my table and sample data

DECLARE @Employee_Log table(ID int,eid int, ecode varchar(100), emp_startdate date)



  INSERT INTO @Employee_Log
SELECT 1, 1, 'aaa','2019-01-01'
UNION ALL
SELECT 2, 1, 'aaa','2019-01-05'
UNION ALL
SELECT 3, 1, 'bbb','2019-01-03'
UNION ALL
SELECT 4, 2, 'aaa','2019-01-03'
UNION ALL
SELECT 5, 1, 'aaa','2019-02-01'
UNION ALL
SELECT 6, 1, 'aaa','2019-02-15'
UNION ALL
SELECT 7, 1, 'aaa','2019-02-19'
UNION ALL
SELECT 8, 1, 'aaa','2019-02-28'

In the above data I want to remove the duplication based on eid and ecode .If the emp_startdate are within 7 days then take the latest data and ignore the rest data.

I tried the following code but how to add the condition check for week range

SELECT 
ROW_NUMBER() OVER(PARTITION BY eid,ecode ORDER BY emp_startdate desc) as rownum,
ID,eid,ecode,emp_startdate
FROM @Employee_Log

I want the result as shown below

  ID    eid ecode   emp_startdate
    2   1   aaa      2019-01-05
    5   1   aaa      2019-02-01
    4   2   aaa      2019-01-03
    7   1   aaa      2019-02-19
    8   1   aaa      2019-02-28
    3   1   bbb      2019-01-03

标签: sql-servertsql

解决方案


如果在同一个 7 天内发生超过 2 个事件,我仍然不确定您想要发生什么。但是此解决方案将获取所有日期系列的最新日期,其中日期之间的差异为 7 天或更短。

select ID,eid,ecode,emp_startdate
from
(
select ID,
eid,
ecode,
emp_startdate,
datediff(day
    ,emp_startdate
    ,lead(emp_startdate) 
        over
        (partition by eid,ecode order by emp_startdate)) l 
from  @Employee_Log
) a
where l is null or l>7

ID     eid     ecode     emp_startdate
--     ---     -----     -------------
3       1       bbb       2019-01-03
2       1       aaa       2019-01-05
5       1       aaa       2019-02-01
7       1       aaa       2019-02-19
8       1       aaa       2019-02-28
4       2       aaa       2019-01-03

推荐阅读