首页 > 解决方案 > 从特定日期回溯 7 天并仅在该日期存在时检索值

问题描述

我有一张表格,它代表特定日期的工作小时数。从该表中,我希望能够在代表前 7 天工作时间的每一行中再添加 7 列。

我一直在使用 LAG() 来实现这一点,并且它工作正常,直到我发现一个问题,即我在几天之间存在差距,现在我选择前 7 天而不是选择前 7 行。

该脚本演示了我要完成的工作。

DECLARE @data TABLE ( 
    [user_id] int,
    [date] DATETIME, 
    [day_hours_worked] VARCHAR(15)
);

INSERT INTO @data
VALUES 
  ( '1', '2019-09-07 00:00:00.000', '07_8' )
, ( '1', '2019-09-08 00:00:00.000', '08_4' )
, ( '1', '2019-09-09 00:00:00.000', '09_6' )
, ( '1', '2019-09-10 00:00:00.000', '10_8' )
, ( '1', '2019-09-11 00:00:00.000', '11_4' )
, ( '1', '2019-09-12 00:00:00.000', '12_6' )
, ( '1', '2019-09-13 00:00:00.000', '13_8' )
, ( '1', '2019-09-14 00:00:00.000', '14_4' )
, ( '1', '2019-09-20 00:00:00.000', '20_8' );

select 
    [user_id],
    [date],
    [day_hours_worked],
    LAG([day_hours_worked], 1) OVER (PARTITION BY [user_id] ORDER BY  [date]) [Hours-1],
    LAG([day_hours_worked], 2) OVER (PARTITION BY [user_id] ORDER BY  [date]) [Hours-2],
    LAG([day_hours_worked], 3) OVER (PARTITION BY [user_id] ORDER BY  [date]) [Hours-3],
    LAG([day_hours_worked], 4) OVER (PARTITION BY [user_id] ORDER BY  [date]) [Hours-4],
    LAG([day_hours_worked], 5) OVER (PARTITION BY [user_id] ORDER BY  [date]) [Hours-5],
    LAG([day_hours_worked], 6) OVER (PARTITION BY [user_id] ORDER BY  [date]) [Hours-6],
    LAG([day_hours_worked], 7) OVER (PARTITION BY [user_id] ORDER BY  [date]) [Hours-7] 
from @data

这就是我得到的。

user_id date    day_hours_worked    Hours-1 Hours-2 Hours-3 Hours-4 Hours-5 Hours-6 Hours-7
1   2019-09-07 00:00:00.000 07_8    NULL    NULL    NULL    NULL    NULL    NULL    NULL
1   2019-09-08 00:00:00.000 08_4    07_8    NULL    NULL    NULL    NULL    NULL    NULL
1   2019-09-09 00:00:00.000 09_6    08_4    07_8    NULL    NULL    NULL    NULL    NULL
1   2019-09-10 00:00:00.000 10_8    09_6    08_4    07_8    NULL    NULL    NULL    NULL
1   2019-09-11 00:00:00.000 11_4    10_8    09_6    08_4    07_8    NULL    NULL    NULL
1   2019-09-12 00:00:00.000 12_6    11_4    10_8    09_6    08_4    07_8    NULL    NULL
1   2019-09-13 00:00:00.000 13_8    12_6    11_4    10_8    09_6    08_4    07_8    NULL
1   2019-09-14 00:00:00.000 14_4    13_8    12_6    11_4    10_8    09_6    08_4    07_8
1   2019-09-20 00:00:00.000 20_8    14_4    13_8    12_6    11_4    10_8    09_6    08_4

最后一行是唯一不正确的行,因为它查看的是前 7 行而不是前 7 天。在我的场景中,预期的结果应该是:

user_id date    day_hours_worked    Hours-1 Hours-2 Hours-3 Hours-4 Hours-5 Hours-6 Hours-7
1   2019-09-07 00:00:00.000 07_8    NULL    NULL    NULL    NULL    NULL    NULL    NULL
1   2019-09-08 00:00:00.000 08_4    07_8    NULL    NULL    NULL    NULL    NULL    NULL
1   2019-09-09 00:00:00.000 09_6    08_4    07_8    NULL    NULL    NULL    NULL    NULL
1   2019-09-10 00:00:00.000 10_8    09_6    08_4    07_8    NULL    NULL    NULL    NULL
1   2019-09-11 00:00:00.000 11_4    10_8    09_6    08_4    07_8    NULL    NULL    NULL
1   2019-09-12 00:00:00.000 12_6    11_4    10_8    09_6    08_4    07_8    NULL    NULL
1   2019-09-13 00:00:00.000 13_8    12_6    11_4    10_8    09_6    08_4    07_8    NULL
1   2019-09-14 00:00:00.000 14_4    13_8    12_6    11_4    10_8    09_6    08_4    07_8
1   2019-09-20 00:00:00.000 20_8    NULL    NULL    NULL    NULL    NULL    14_4    13_8

这甚至可以使用 LAG() 吗?有没有更好的方法来完成我想要的?

提前致谢。

标签: sqlsql-servertsqldatetime

解决方案


我认为在 SQL Server 中没有一种优雅的方法可以做到这一点。最简单的可能是一个case表达式:

(case when lag(date, 1) over (partition by user_id order by date) = dateadd(day, 1, date)
      then lag(day_hours_worked, 1) over (partition by [user_id] order by [date]) 
 end) as [Hours-1],

一些数据库支持range时间间隔,这使得这更简单。

编辑:

你可以做你想做的事apply

select d.*, d2.*, d2.*
from data d outer apply
     (select max(case when dd.diff = 1 then d2.day_hours_worked end) as w1,
             max(case when dd.diff = 2 then d2.day_hours_worked end) as w2,
             max(case when dd.diff = 3 then d2.day_hours_worked end) as w3,
             max(case when dd.diff = 4 then d2.day_hours_worked end) as w4,
             max(case when dd.diff = 5 then d2.day_hours_worked end) as w5,
             max(case when dd.diff = 6 then d2.day_hours_worked end) as w6,
             max(case when dd.diff = 7 then d2.day_hours_worked end) as w7
      from data d2 cross apply
           (values (datediff(day, d2.date, d.date) )) dd(diff)
      where d2.date < d.date and d2.date >= dateadd(day, -7, d.date)
     ) d2;

是一个 db<>fiddle。


推荐阅读