首页 > 解决方案 > 如何根据某个时间段内的时间戳或以前的记录对数据进行分桶?

问题描述

我有一些我正在尝试存储的数据。假设数据具有用户和时间戳。我想将会话定义为在用户上一个时间戳的 10 分钟内具有时间戳的任何行。

我将如何在 SQL 中解决这个问题?

例子

+------+---------------------+---------+
| user |      timestamp      | session |
+------+---------------------+---------+
|    1 | 2021-05-09 15:12:52 |       1 |
|    1 | 2021-05-09 15:18:52 |       1 | within 10 min of previous timestamp
|    1 | 2021-05-09 15:32:52 |       2 | over 10 min, new session
|    2 | 2021-05-09 16:00:00 |       1 | different user
|    1 | 2021-05-09 17:00:00 |       3 | new session
|    1 | 2021-05-09 17:02:00 |       3 |
+------+---------------------+---------+

这将在 10 分钟内给我记录,但我将如何像上面那样存储它们?

with cte as (
    select user,
        timestamp,
        lag(timestamp) over (partition by user order by timestamp) as last_timestamp
    from table
)
select *
from cte
where datediff(mm, last_timestamp, timestamp) <= 10

标签: sqlsql-server

解决方案


试试这个。这基本上是一个边缘问题。

SQL Server 的工作测试用例

SQL:

with cte as (
    select user1
         , timestamp1
         , session1 AS session_expected
         , lag(timestamp1) over (partition by user1 order by timestamp1) as last_timestamp
         , CASE WHEN datediff(n, lag(timestamp1) over (partition by user1 order by timestamp1), timestamp1) <= 10 THEN 0 ELSE 1 END AS edge
      from table1
    )
select *, SUM(edge) OVER (PARTITION BY user1 ORDER BY timestamp1) AS session_actual
  from cte
 ORDER BY timestamp1
;

其他建议,请参阅ROWS UNBOUNDED PRECEDING(感谢@Charlieface):

with cte as (
    select user1
         , timestamp1
         , session1 AS session_expected
         , lag(timestamp1) over (partition by user1 order by timestamp1) as last_timestamp
         , CASE WHEN datediff(n, lag(timestamp1) over (partition by user1 order by timestamp1), timestamp1) <= 10 THEN 0 ELSE 1 END AS edge
      from table1
    )
select *
     , SUM(edge) OVER (PARTITION BY user1 ORDER BY timestamp1 ROWS UNBOUNDED PRECEDING) AS session_actual
  from cte
 ORDER BY timestamp1
;

结果:

在此处输入图像描述

设置:

CREATE TABLE table1 (user1 int,   timestamp1 datetime, session1 int);

INSERT INTO table1 VALUES
  (    1 , '2021-05-09 15:12:52' ,       1 )
, (    1 , '2021-05-09 15:18:52' ,       1 ) -- within 10 min of previous timestamp
, (    1 , '2021-05-09 15:32:52' ,       2 ) -- over 10 min, new session
, (    2 , '2021-05-09 16:00:00' ,       1 ) -- different user
, (    1 , '2021-05-09 17:00:00' ,       3 ) -- new session
, (    1 , '2021-05-09 17:02:00' ,       3 )
;

推荐阅读