sql - 动态连接透视表
问题描述
也许社区可以就这个 Postgres 问题(PostgreSQL 9.5)给我建议
有一个大的(150 万行)飞行日志表,其中包含以下列:动作类型(4 种动作类型)、时间戳和 user_id。users 表有 6K 行。
**flightlog**
user_id, time, action
2301 "2016-10-25 14:13:25.74668" "View"
8 "2016-04-25 15:02:13.916204" "Download"
8 "2016-04-25 15:01:20.553475" "Download"
8 "2016-04-25 14:57:02.430493" "Download"
8 "2016-04-25 14:57:02.160002" "Download"
8 "2016-04-25 14:57:01.397602" "Download"
26 "2016-10-25 16:01:25.005285" "View"
216 "2016-10-24 14:46:16.035242" "View"
2182 "2016-10-24 14:47:43.713" "View"
243 "2016-10-24 12:10:12.187181" "View"
26 "2016-10-24 15:01:26.269981" "View"
26 "2016-10-24 15:01:28.122361" "View"
**users**
user_id, email
8 "ndoe@mysite.com"
26 "jdoe@mysite.com"
2301 "kdoe@mysite.com"
**subscriptions**
user_id, expires
8 "2017-08-30 15:48:06.827258"
26 "2017-08-10 00:00:00"
2301 "2017-09-28 09:09:17.56549"
我需要有一个表,其中包含用户每月 4 次不同操作的计数,因此它将是用户,然后是每月 4 次操作,并且复制 12 次。这些列将如下所示:
user1 period1_action1 period1_action2 period1_action3 period1_action4 period2_action1 etc
更复杂的是,这 12 个月对于每个用户来说应该是动态的,从订阅表中的订阅日期算起 +12 个月(10K)。
到目前为止,我可以使用基于 FILTER 的操作枢轴,并通过 cte 选择用户名和订阅开始来加入它。
with counters ( <doing counts using windowing functions>),
pivot1 ( <pivoting counters using FILTER>
...sum(times) filter (where action = 'action1')...
),
recent_subscription (<picking latest subscription for a user>),
titles (<using previous cte and adding more info from info table>)
select t.user, t.id, t.subscription_starts, t.expires_at, t.title, email,
p."action1", p."action2", p."action3 ", p."action4"
from titles t
join pivot1 p
...
正确地为用户生成 12 个月的活动数据,格式如下:
user1 ... 1st_period_4user1 action1 action2 action3 action4
user1 ... 2nd_period_4user1 action1 action2 action3 action4
user2 ... 1st_period_4user2 action1 action2 action3 action4
user2 ... 2nd_period_4user2 action1 action2 action3 action4
user3 ... 1st_period_4user3 action1 action2 action3 action4
user3 ... 2nd_period_4user3 action1 action2 action3 action4
ETC
但现在的挑战是再次调整它以获得 12periods/4actions 组合。如果按照以下方式使用 json_object_aggr 作为(句点:然后是 4 个动作)完成,可能只有 12 个
--using the piece above as another CTE called merged
--this code does not work :(
select
email, id, ends, subs, info,
json_object_aggr(starts, s1,v1,p1,d1 ORDER BY starts) as P1,
json_object_aggr(starts, s2,v2,p2,d2 ORDER BY starts) as P2,
json_object_aggr(starts, s3,v3,p3,d3 ORDER BY starts) as P3,
json_object_aggr(starts, s4,v4,p4,d4 ORDER BY starts) as P4,
json_object_aggr(starts, s5,v5,p5,d5 ORDER BY starts) as P5,
json_object_aggr(starts, s6,v6,p6,d6 ORDER BY starts) as P6,
json_object_aggr(starts, s7,v7,p7,d7 ORDER BY starts) as P7,
json_object_aggr(starts, s8,v8,p8,d8 ORDER BY starts) as P8,
json_object_aggr(starts, s9,v9,p9,d9 ORDER BY starts) as P9,
json_object_aggr(starts, s10,v10,p10,d10 ORDER BY starts) as P10,
json_object_aggr(starts, s11,v11,p11,d11 ORDER BY starts) as P11,
json_object_aggr(starts, s12,v12,p12,d12 ORDER BY starts) as P12
from
(select email, id, starts, ends, subs, info, starts,
sum("action1") as s1,sum("action2") as v1,sum("action3") as
p1,sum("action4")
as d1
from merged
group by email, id, starts, ends, subs, info, starts
) m
group by email, id, starts, ends, subs, info
order by email, id, starts, ends, subs, info
这可能是 json_object_agg 每个周期 4 个动作吗?我可以就如何解决这个问题获得帮助吗?
谢谢你。
解决方案
您是否有理由不能使用一系列CASE
导致 1 或 0 的条件,然后再使用SUM()
它们?这将使这个过程更简单。
WITH subs AS (
SELECT s.user_id, u.email, MAX(s.sub_date) AS recent_sub_date
FROM subscriptions s
JOIN users u ON s.userid = u.user_id
GROUP BY s.user_id, u.email
)
SELECT s.user_id,
SUM(CASE WHEN f.action = 'action1' AND f.time <= s.recent_sub_date + INTERVAL '1 month' THEN 1 ELSE 0 END) AS period1_action1,
SUM(CASE WHEN f.action = 'action2' AND f.time <= s.recent_sub_date + INTERVAL '1 month' THEN 1 ELSE 0 END) AS period1_action2,
SUM(CASE WHEN f.action = 'action3' AND f.time <= s.recent_sub_date + INTERVAL '1 month' THEN 1 ELSE 0 END) AS period1_action3,
SUM(CASE WHEN f.action = 'action4' AND f.time <= s.recent_sub_date + INTERVAL '1 month' THEN 1 ELSE 0 END) AS period1_action4,
SUM(CASE WHEN f.action = 'action1' AND f.time <= s.recent_sub_date + INTERVAL '2 months' THEN 1 ELSE 0 END) AS period2_action1,
SUM(CASE WHEN f.action = 'action2' AND f.time <= s.recent_sub_date + INTERVAL '2 months' THEN 1 ELSE 0 END) AS period2_action2,
SUM(CASE WHEN f.action = 'action3' AND f.time <= s.recent_sub_date + INTERVAL '2 months' THEN 1 ELSE 0 END) AS period2_action3,
SUM(CASE WHEN f.action = 'action4' AND f.time <= s.recent_sub_date + INTERVAL '2 months' THEN 1 ELSE 0 END) AS period2_action4,
...
FROM flightlog f
JOIN subs s ON s.user_id = f.user_id
WHERE f.time > s.recent_sub_date
AND f.time <= DATE_TRUNC('month', s.recent_sub_date + INTERVAL '13 months') -- end of the 12 months after sub
GROUP BY s.user_id;
注意:如果您的日期没有索引,无论您如何编写查询,这可能会非常慢。
推荐阅读
- python - 写入 txt 文件时删除所有引号
- android - 使用 RxBus 将对象传输到片段
- mysql - SQL 根据不同行中的值更改 id 的另一行
- python - 如何修复命令“python setup.py egg_info”失败?
- python - 填充 Pandas 数据框中两列之间的数字
- android - TextView 文本不显示
- ios - fancybox3 图像按钮 iframe youtube 视频
- visual-studio-code - 使用户能够减小活动栏上的图标大小
- javascript - 如何隐藏属于锚点的 i 元素
- python - 如何在 python 中提高图像的分辨率 - 一种可能的解决方案?