sql - 不同 ID 的 30 天滚动计数
问题描述
因此,在查看了似乎是一个常见问题并且无法找到任何适合我的解决方案之后,我决定我应该问自己。
我有一个包含两列的数据集: session_start_time, uid
我正在尝试生成一个滚动 30 天的独特会话计数
查询每天唯一 uid 的数量很简单:
SELECT
COUNT(DISTINCT(uid))
FROM segment_clean.users_sessions
WHERE session_start_time >= CURRENT_DATE - interval '30 days'
计算一个日期范围内的每日唯一 uid 也相对简单。
SELECT
DATE_TRUNC('day',session_start_time) AS "date"
,COUNT(DISTINCT uid) AS "count"
FROM segment_clean.users_sessions
WHERE session_start_time >= CURRENT_DATE - INTERVAL '90 days'
GROUP BY date(session_start_time)
然后我尝试了几种方法在一段时间内进行滚动 30 天唯一计数
SELECT
DATE(session_start_time) AS "running30day"
,COUNT(distinct(
case when date(session_start_time) >= running30day - interval '30 days'
AND date(session_start_time) <= running30day
then uid
end)
) AS "unique_30day"
FROM segment_clean.users_sessions
WHERE session_start_time >= CURRENT_DATE - interval '3 months'
GROUP BY date(session_start_time)
Order BY running30day desc
我真的认为这会起作用,但是在查看结果时,似乎我得到的结果与我在进行每日唯一而不是 30 天以上的唯一时得到的结果相同。
我正在使用 SQL 查询编辑器从 Metabase 编写此查询。基础表处于红移状态。
如果你读到这里,谢谢你,你的时间很有价值,我很感激你花了一些时间来阅读我的问题。
编辑:按照正确的要求,我添加了一个我正在使用的数据集和期望结果的示例。
+-----+-------------------------------+
| UID | SESSION_START_TIME |
+-----+-------------------------------+
| | |
| 10 | 2020-01-13T01:46:07.000-05:00 |
| | |
| 5 | 2020-01-13T01:46:07.000-05:00 |
| | |
| 3 | 2020-01-18T02:49:23.000-05:00 |
| | |
| 9 | 2020-03-06T18:18:28.000-05:00 |
| | |
| 2 | 2020-03-06T18:18:28.000-05:00 |
| | |
| 8 | 2020-03-31T23:13:33.000-04:00 |
| | |
| 3 | 2020-08-28T18:23:15.000-04:00 |
| | |
| 2 | 2020-08-28T18:23:15.000-04:00 |
| | |
| 9 | 2020-08-28T18:23:15.000-04:00 |
| | |
| 3 | 2020-08-28T18:23:15.000-04:00 |
| | |
| 8 | 2020-09-15T16:40:29.000-04:00 |
| | |
| 3 | 2020-09-21T20:49:09.000-04:00 |
| | |
| 1 | 2020-11-05T21:31:48.000-05:00 |
| | |
| 6 | 2020-11-05T21:31:48.000-05:00 |
| | |
| 8 | 2020-12-12T04:42:00.000-05:00 |
| | |
| 8 | 2020-12-12T04:42:00.000-05:00 |
| | |
| 5 | 2020-12-12T04:42:00.000-05:00 |
+-----+-------------------------------+
下面是我想要的结果:
+------------+---------------------+
| DATE | UNIQUE 30 DAY COUNT |
+------------+---------------------+
| | |
| 2020-01-13 | 3 |
| | |
| 2020-01-18 | 1 |
| | |
| 2020-03-06 | 3 |
| | |
| 2020-03-31 | 1 |
| | |
| 2020-08-28 | 4 |
| | |
| 2020-09-15 | 2 |
| | |
| 2020-09-21 | 1 |
| | |
| 2020-11-05 | 2 |
| | |
| 2020-12-12 | 2 |
+------------+---------------------+
谢谢
解决方案
您可以通过保留一个计数器来计算用户何时被计数,然后在 30 天(或者可能是 31 天)之后不计数。然后,确定被计数的“岛屿”,并汇总。这涉及:
- 取消透视数据以对每个会话进行“进入计数”和“离开”计数。
- 每天为您知道他们是否被计算在内的每个用户累积计数。
- 这定义了计数的“孤岛”。确定岛屿的起点和终点——清除中间的所有碎屑。
- 现在,您可以简单地对每个日期进行累计计算,以确定 30 天的会话。
在 SQL 中,这看起来像:
with t as (
select uid, date_trunc('day', session_start_time) as s_day, 1 as inc
from users_sessions
union all
select uid, date_trunc('day', session_start_time) + interval '31 day' as s_day, -1
from users_sessions
),
tt as ( -- increment the ins and outs to determine whether a uid is in or out on a given day
select uid, s_day, sum(inc) as day_inc,
sum(sum(inc)) over (partition by uid order by s_day rows between unbounded preceding and current row) as running_inc
from t
group by uid, s_day
),
ttt as ( -- find the beginning and end of the islands
select tt.uid, tt.s_day,
(case when running_inc > 0 then 1 else -1 end) as in_island
from (select tt.*,
lag(running_inc) over (partition by uid order by s_day) as prev_running_inc,
lead(running_inc) over (partition by uid order by s_day) as next_running_inc
from tt
) tt
where running_inc > 0 and (prev_running_inc = 0 or prev_running_inc is null) or
running_inc = 0 and (next_running_inc > 0 or next_running_inc is null)
)
select s_day,
sum(sum(in_island)) over (order by s_day rows between unbounded preceding and current row) as active_30
from ttt
group by s_day;
这是一个 db<>fiddle。
推荐阅读
- python - 如何在 pyqt5 中显示 geemap.link_map
- powershell - PowerShell中的“grep -Ern”等式是什么?
- visual-studio-code - 如何自动编写html结构
- linux - 如何使用命令行快速浏览目录?
- reactjs - 由 `import("react-markdown")` 引起的“错误:不支持”
- python - Python:从DataFrame中的两列获取日期之间的天数而不使用apply()
- angular - 如何使用formgroup中的对象元素作为formcontrolname
- mysql - 在 django orm 中对多个注释求和
- java - 如何使用 Java 设置 Logitech 的 LED 照明 SDK?
- c++ - 我的程序的输入在没有第一个字符的情况下打印出来