sql - Postgres - 按会话聚合用户事件
问题描述
我有一个表,其中包含如下所示的事件:ID、USER_ID、CREATED_AT、EVENT_NAME。
我正在尝试获取用户通常在会话中创建的事件序列。当用户事件与上一个事件相距超过 5 分钟时,新会话开始。
我尽可能地创建一个包含以下信息的视图:
按该顺序读取表格,每次“TIME_DIFF”大于 5 分钟时,都会开始一个新会话。
我现在如何按会话聚合事件,以便最终得到这样的结果?
如下表,视图和一些测试数据:
CREATE SCHEMA test;
CREATE TABLE test."TRACKING_EVENTS" (
"ID" serial PRIMARY key,
"USER_ID" text,
"CREATED_AT" TIMESTAMP,
"EVENT_NAME" text
);
CREATE VIEW
test."ORDERED_EVENTS"
AS
SELECT
"ID",
"USER_ID",
"CREATED_AT" AS "EVENT_TIME",
"EVENT_NAME",
CASE WHEN
lag("CREATED_AT", 1) OVER (ORDER BY "USER_ID", "CREATED_AT") < "CREATED_AT"
THEN
lag("CREATED_AT", 1) OVER (ORDER BY "USER_ID", "CREATED_AT")
ELSE
NULL
END AS "PREVIOUS_EVENT_TIME"
FROM
test."TRACKING_EVENTS";
CREATE VIEW
test."ORDERED_EVENTS_WITH_DIFF"
AS
SELECT
"ID",
"USER_ID",
"EVENT_TIME",
"EVENT_NAME",
"PREVIOUS_EVENT_TIME",
"EVENT_TIME" - "PREVIOUS_EVENT_TIME" AS "TIME_DIFF"
FROM
test."ORDERED_EVENTS";
-- Period 1
INSERT INTO test."TRACKING_EVENTS" ("ID", "USER_ID", "CREATED_AT", "EVENT_NAME")
VALUES (1, 'user1', '2019-1-1 01:00:00'::timestamp, 'EVENT_1');
INSERT INTO test."TRACKING_EVENTS" ("ID", "USER_ID", "CREATED_AT", "EVENT_NAME")
VALUES (3, 'user1', '2019-1-1 01:00:05'::timestamp, 'EVENT_2');
INSERT INTO test."TRACKING_EVENTS" ("ID", "USER_ID", "CREATED_AT", "EVENT_NAME")
VALUES (5, 'user1', '2019-1-1 01:00:10'::timestamp, 'EVENT_3');
INSERT INTO test."TRACKING_EVENTS" ("ID", "USER_ID", "CREATED_AT", "EVENT_NAME")
VALUES (2, 'user2', '2019-1-1 01:00:01'::timestamp, 'EVENT_1');
INSERT INTO test."TRACKING_EVENTS" ("ID", "USER_ID", "CREATED_AT", "EVENT_NAME")
VALUES (4, 'user2', '2019-1-1 01:00:06'::timestamp, 'EVENT_2');
INSERT INTO test."TRACKING_EVENTS" ("ID", "USER_ID", "CREATED_AT", "EVENT_NAME")
VALUES (6, 'user2', '2019-1-1 01:00:11'::timestamp, 'EVENT_3');
-- Period 2
INSERT INTO test."TRACKING_EVENTS" ("ID", "USER_ID", "CREATED_AT", "EVENT_NAME")
VALUES (7, 'user1', '2019-1-1 01:10:00'::timestamp, 'EVENT_1');
INSERT INTO test."TRACKING_EVENTS" ("ID", "USER_ID", "CREATED_AT", "EVENT_NAME")
VALUES (9, 'user1', '2019-1-1 01:10:05'::timestamp, 'EVENT_2');
INSERT INTO test."TRACKING_EVENTS" ("ID", "USER_ID", "CREATED_AT", "EVENT_NAME")
VALUES (11, 'user1', '2019-1-1 01:10:10'::timestamp, 'EVENT_3');
INSERT INTO test."TRACKING_EVENTS" ("ID", "USER_ID", "CREATED_AT", "EVENT_NAME")
VALUES (8, 'user2', '2019-1-1 01:10:01'::timestamp, 'EVENT_1');
INSERT INTO test."TRACKING_EVENTS" ("ID", "USER_ID", "CREATED_AT", "EVENT_NAME")
VALUES (10, 'user2', '2019-1-1 01:10:06'::timestamp, 'EVENT_2');
INSERT INTO test."TRACKING_EVENTS" ("ID", "USER_ID", "CREATED_AT", "EVENT_NAME")
VALUES (12, 'user2', '2019-1-1 01:10:11'::timestamp, 'EVENT_3');
解决方案
我认为这就是你想要的:
select user_id, session,
array_agg(event_name order by created_at)
from (select tt.*,
count(*) filter (where prev_ca < created_at - interval '5 minute') over (partition by user_id order by created_at) as session
from (select tt.*,
lag(created_at) over (partition by user_id order by CREATED_AT) as prev_ca
from test."TRACKING_EVENTS" tt
) tt
) tt
group by user_id, session
order by user_id, session;
请注意,这使用array_agg()
而不是string_agg()
. 您正在使用 Postgres,因此array_agg()
是将多个值组合在一起的好方法。
推荐阅读
- python - 在函数中创建 pandas DataFrame
- arrays - 数组中的 Swift 分组顺序对象
- docker - 无法访问 docker 网络服务,但可以从外部 IP 访问
- github - 添加新存储库时的Github问题
- pandas - 合并 groupby ffill 然后 bfill 的代码
- c++ - C ++共享内存并从另一个可执行文件调用一个可执行文件中的函数?
- azure - http:来自 10.244.0.6:36004 的 TLS 握手错误:远程错误:tls:证书错误
- wordpress - 如何在 Wordpress 中上传时自动将照片从小调整为大?
- javascript - javascript 中的 Date().timeIntervalSince1970
- c++ - 为什么来自 TDM-GCC 的 g++ 找不到包含 hpp 文件?