sql - 如何根据某个时间段内的时间戳或以前的记录对数据进行分桶?
问题描述
我有一些我正在尝试存储的数据。假设数据具有用户和时间戳。我想将会话定义为在用户上一个时间戳的 10 分钟内具有时间戳的任何行。
我将如何在 SQL 中解决这个问题?
例子
+------+---------------------+---------+
| user | timestamp | session |
+------+---------------------+---------+
| 1 | 2021-05-09 15:12:52 | 1 |
| 1 | 2021-05-09 15:18:52 | 1 | within 10 min of previous timestamp
| 1 | 2021-05-09 15:32:52 | 2 | over 10 min, new session
| 2 | 2021-05-09 16:00:00 | 1 | different user
| 1 | 2021-05-09 17:00:00 | 3 | new session
| 1 | 2021-05-09 17:02:00 | 3 |
+------+---------------------+---------+
这将在 10 分钟内给我记录,但我将如何像上面那样存储它们?
with cte as (
select user,
timestamp,
lag(timestamp) over (partition by user order by timestamp) as last_timestamp
from table
)
select *
from cte
where datediff(mm, last_timestamp, timestamp) <= 10
解决方案
试试这个。这基本上是一个边缘问题。
SQL:
with cte as (
select user1
, timestamp1
, session1 AS session_expected
, lag(timestamp1) over (partition by user1 order by timestamp1) as last_timestamp
, CASE WHEN datediff(n, lag(timestamp1) over (partition by user1 order by timestamp1), timestamp1) <= 10 THEN 0 ELSE 1 END AS edge
from table1
)
select *, SUM(edge) OVER (PARTITION BY user1 ORDER BY timestamp1) AS session_actual
from cte
ORDER BY timestamp1
;
其他建议,请参阅ROWS UNBOUNDED PRECEDING
(感谢@Charlieface):
with cte as (
select user1
, timestamp1
, session1 AS session_expected
, lag(timestamp1) over (partition by user1 order by timestamp1) as last_timestamp
, CASE WHEN datediff(n, lag(timestamp1) over (partition by user1 order by timestamp1), timestamp1) <= 10 THEN 0 ELSE 1 END AS edge
from table1
)
select *
, SUM(edge) OVER (PARTITION BY user1 ORDER BY timestamp1 ROWS UNBOUNDED PRECEDING) AS session_actual
from cte
ORDER BY timestamp1
;
结果:
设置:
CREATE TABLE table1 (user1 int, timestamp1 datetime, session1 int);
INSERT INTO table1 VALUES
( 1 , '2021-05-09 15:12:52' , 1 )
, ( 1 , '2021-05-09 15:18:52' , 1 ) -- within 10 min of previous timestamp
, ( 1 , '2021-05-09 15:32:52' , 2 ) -- over 10 min, new session
, ( 2 , '2021-05-09 16:00:00' , 1 ) -- different user
, ( 1 , '2021-05-09 17:00:00' , 3 ) -- new session
, ( 1 , '2021-05-09 17:02:00' , 3 )
;
推荐阅读
- asp.net - VB 代码中无法访问 WebForms ScriptManager 标记
- github - 需要从 github 下载所有具有大量历史记录的审计日志
- javascript - Kraken API 私有请求身份验证 {"error":["EAPI:Invalid key"]} - Google 脚本
- python - 仅打印一批信息的代码
- javascript - javascript中的window.location.href只是重新加载页面,而不是导航到另一个网站
- java - Flutter:无法解析配置“:path_provider:classpath”的所有工件
- angular - 直到刷新才会删除
- android - 使用 Kotlin 创建扩展函数以将资源分配给 ImageView
- linux - task_struct 中的 nivcsw 和 nvcsw 字段是什么?
- ios - Swift:struct no init 中的快速访问功能