首页 > 解决方案 > 动态连接透视表

问题描述

也许社区可以就这个 Postgres 问题(PostgreSQL 9.5)给我建议

有一个大的(150 万行)飞行日志表,其中包含以下列:动作类型(4 种动作类型)、时间戳和 user_id。users 表有 6K 行。

**flightlog**
user_id, time, action
2301    "2016-10-25 14:13:25.74668" "View"
8   "2016-04-25 15:02:13.916204"    "Download"
8   "2016-04-25 15:01:20.553475"    "Download"
8   "2016-04-25 14:57:02.430493"    "Download"
8   "2016-04-25 14:57:02.160002"    "Download"
8   "2016-04-25 14:57:01.397602"    "Download"
26  "2016-10-25 16:01:25.005285"    "View"
216 "2016-10-24 14:46:16.035242"    "View"
2182    "2016-10-24 14:47:43.713"   "View"
243 "2016-10-24 12:10:12.187181"    "View"
26  "2016-10-24 15:01:26.269981"    "View"
26  "2016-10-24 15:01:28.122361"    "View"

**users**
user_id, email
8 "ndoe@mysite.com"
26  "jdoe@mysite.com"
2301 "kdoe@mysite.com"

**subscriptions**
user_id, expires
8    "2017-08-30 15:48:06.827258"
26   "2017-08-10 00:00:00"
2301 "2017-09-28 09:09:17.56549"

我需要有一个表,其中包含用户每月 4 次不同操作的计数,因此它将是用户,然后是每月 4 次操作,并且复制 12 次。这些列将如下所示:

user1  period1_action1 period1_action2 period1_action3 period1_action4 period2_action1 etc

更复杂的是,这 12 个月对于每个用户来说应该是动态的,从订阅表中的订阅日期算起 +12 个月(10K)。

到目前为止,我可以使用基于 FILTER 的操作枢轴,并通过 cte 选择用户名和订阅开始来加入它。

with counters ( <doing counts using windowing functions>),
     pivot1   ( <pivoting counters using FILTER>
              ...sum(times) filter (where action = 'action1')...
              ),
     recent_subscription (<picking latest subscription for a user>),
     titles   (<using previous cte and adding more info from info table>)

select t.user, t.id, t.subscription_starts, t.expires_at, t.title, email,
                p."action1", p."action2", p."action3 ", p."action4"
      from titles t
      join pivot1 p
...

正确地为用户生成 12 个月的活动数据,格式如下:

user1 ... 1st_period_4user1 action1 action2 action3 action4
user1 ... 2nd_period_4user1 action1 action2 action3 action4
user2 ... 1st_period_4user2 action1 action2 action3 action4
user2 ... 2nd_period_4user2 action1 action2 action3 action4
user3 ... 1st_period_4user3 action1 action2 action3 action4
user3 ... 2nd_period_4user3 action1 action2 action3 action4

ETC

但现在的挑战是再次调整它以获得 12periods/4actions 组合。如果按照以下方式使用 json_object_aggr 作为(句点:然后是 4 个动作)完成,可能只有 12 个

 --using the piece above as another CTE called merged 
 --this code does not work :(
select 
        email, id, ends, subs, info, 
        json_object_aggr(starts, s1,v1,p1,d1 ORDER BY starts) as P1,
        json_object_aggr(starts, s2,v2,p2,d2 ORDER BY starts) as P2,
        json_object_aggr(starts, s3,v3,p3,d3 ORDER BY starts) as P3,
        json_object_aggr(starts, s4,v4,p4,d4 ORDER BY starts) as P4,
        json_object_aggr(starts, s5,v5,p5,d5 ORDER BY starts) as P5,
        json_object_aggr(starts, s6,v6,p6,d6 ORDER BY starts) as P6,
        json_object_aggr(starts, s7,v7,p7,d7 ORDER BY starts) as P7,
        json_object_aggr(starts, s8,v8,p8,d8 ORDER BY starts) as P8,
        json_object_aggr(starts, s9,v9,p9,d9 ORDER BY starts) as P9,
        json_object_aggr(starts, s10,v10,p10,d10 ORDER BY starts) as P10,
        json_object_aggr(starts, s11,v11,p11,d11 ORDER BY starts) as P11,
        json_object_aggr(starts, s12,v12,p12,d12 ORDER BY starts) as P12
    from 
            (select email, id,  starts, ends, subs, info, starts, 
  sum("action1") as s1,sum("action2") as v1,sum("action3") as 
  p1,sum("action4") 
  as d1 
              from merged
              group by email, id,  starts, ends, subs, info, starts

            ) m
    group by email, id,  starts, ends, subs, info
    order by email, id,  starts, ends, subs, info 

这可能是 json_object_agg 每个周期 4 个动作吗?我可以就如何解决这个问题获得帮助吗?

谢谢你。

标签: sqlpostgresql

解决方案


您是否有理由不能使用一系列CASE导致 1 或 0 的条件,然后再使用SUM()它们?这将使这个过程更简单。

WITH subs AS (
      SELECT s.user_id, u.email, MAX(s.sub_date) AS recent_sub_date 
      FROM subscriptions s 
      JOIN users u ON s.userid = u.user_id
      GROUP BY s.user_id, u.email
)
SELECT s.user_id,
       SUM(CASE WHEN f.action = 'action1' AND f.time <= s.recent_sub_date + INTERVAL '1 month' THEN 1 ELSE 0 END) AS period1_action1,
       SUM(CASE WHEN f.action = 'action2' AND f.time <= s.recent_sub_date + INTERVAL '1 month' THEN 1 ELSE 0 END) AS period1_action2,
       SUM(CASE WHEN f.action = 'action3' AND f.time <= s.recent_sub_date + INTERVAL '1 month' THEN 1 ELSE 0 END) AS period1_action3,
       SUM(CASE WHEN f.action = 'action4' AND f.time <= s.recent_sub_date + INTERVAL '1 month' THEN 1 ELSE 0 END) AS period1_action4,
       SUM(CASE WHEN f.action = 'action1' AND f.time <= s.recent_sub_date + INTERVAL '2 months' THEN 1 ELSE 0 END) AS period2_action1,
       SUM(CASE WHEN f.action = 'action2' AND f.time <= s.recent_sub_date + INTERVAL '2 months' THEN 1 ELSE 0 END) AS period2_action2,
       SUM(CASE WHEN f.action = 'action3' AND f.time <= s.recent_sub_date + INTERVAL '2 months' THEN 1 ELSE 0 END) AS period2_action3,
       SUM(CASE WHEN f.action = 'action4' AND f.time <= s.recent_sub_date + INTERVAL '2 months' THEN 1 ELSE 0 END) AS period2_action4,
       ...
FROM flightlog f
JOIN subs s ON s.user_id = f.user_id 
WHERE f.time > s.recent_sub_date
AND f.time <= DATE_TRUNC('month', s.recent_sub_date + INTERVAL '13 months') -- end of the 12 months after sub
GROUP BY s.user_id;

注意:如果您的日期没有索引,无论您如何编写查询,这可能会非常慢。


推荐阅读