sql - BigQuery:如何计算过去 2 天不同访问者的运行计数
问题描述
我想计算每个日期在过去 2 天内的唯一用户数。
第一个查询:-我尝试使用CASE语句给我当天的用户数,这不是预期的结果,即使我尝试使用窗口函数。
我知道一种通过自连接(已经提到作为第二个查询)的替代解决方案,它给了我期望的正确答案,但我想在一个查询中完成。
在单个查询中做的原因是希望减少处理的数据大小,如果我进行自连接它将读取完整的表两次,并且原始表大小是多 TB。
SELECT
(CASE WHEN dt BETWEEN DATE_SUB(dt, INTERVAL 1 DAY) AND dt THEN
CONCAT(CAST(DATE_SUB(dt, INTERVAL 1 DAY) AS STRING), '::', CAST(dt AS STRING)) END) AS Date_range,
COUNT(DISTINCT (CASE WHEN dt BETWEEN DATE_SUB(dt, INTERVAL 1 DAY) AND dt THEN Visitor_Name END)) AS Visitor_Count
FROM
(SELECT DATE('2018-01-01') AS dt, 'A' AS Visitor_Name
UNION ALL
SELECT '2018-01-01' AS dt, 'B' AS Visitor_Name
UNION ALL
SELECT '2018-01-01' AS dt, 'C' AS Visitor_Name
UNION ALL
SELECT '2018-01-01' AS dt, 'D' AS Visitor_Name
UNION ALL
SELECT '2018-01-02' AS dt, 'B' AS Visitor_Name
UNION ALL
SELECT '2018-01-02' AS dt, 'C' AS Visitor_Name
UNION ALL
SELECT '2018-01-02' AS dt, 'E' AS Visitor_Name
UNION ALL
SELECT '2018-01-03' AS dt, 'A' AS Visitor_Name
UNION ALL
SELECT '2018-01-03' AS dt, 'P' AS Visitor_Name
UNION ALL
SELECT '2018-01-04' AS dt, 'A' AS Visitor_Name
UNION ALL
SELECT '2018-01-04' AS dt, 'C' AS Visitor_Name
UNION ALL
SELECT '2018-01-05' AS dt, 'D' AS Visitor_Name
UNION ALL
SELECT '2018-01-05' AS dt, 'B' AS Visitor_Name
UNION ALL
SELECT '2018-01-05' AS dt, 'B' AS Visitor_Name
SELECT '2018-01-06' AS dt, 'P' AS Visitor_Name)
GROUP BY Date_range
ORDER BY Date_range;
解决方案:
SELECT
(CASE WHEN a.dt BETWEEN DATE_SUB(b.dt, INTERVAL 1 DAY) AND b.dt THEN
CONCAT(CAST(DATE_SUB(b.dt, INTERVAL 1 DAY) AS STRING), '::', CAST(b.dt AS STRING)) END) AS Date_range,
COUNT(DISTINCT (CASE WHEN a.dt BETWEEN DATE_SUB(b.dt, INTERVAL 1 DAY) AND b.dt THEN a.Visitor_Name END)) AS Visitor_Count
FROM
(SELECT DATE('2018-01-01') AS dt, 'A' AS Visitor_Name
UNION ALL
SELECT '2018-01-01' AS dt, 'B' AS Visitor_Name
UNION ALL
SELECT '2018-01-01' AS dt, 'C' AS Visitor_Name
UNION ALL
SELECT '2018-01-01' AS dt, 'D' AS Visitor_Name
UNION ALL
SELECT '2018-01-02' AS dt, 'B' AS Visitor_Name
UNION ALL
SELECT '2018-01-02' AS dt, 'C' AS Visitor_Name
UNION ALL
SELECT '2018-01-02' AS dt, 'E' AS Visitor_Name
UNION ALL
SELECT '2018-01-03' AS dt, 'A' AS Visitor_Name
UNION ALL
SELECT '2018-01-03' AS dt, 'P' AS Visitor_Name
UNION ALL
SELECT '2018-01-04' AS dt, 'A' AS Visitor_Name
UNION ALL
SELECT '2018-01-04' AS dt, 'C' AS Visitor_Name
UNION ALL
SELECT '2018-01-05' AS dt, 'D' AS Visitor_Name
UNION ALL
SELECT '2018-01-05' AS dt, 'B' AS Visitor_Name
UNION ALL
SELECT '2018-01-05' AS dt, 'B' AS Visitor_Name
UNION ALL
SELECT '2018-01-06' AS dt, 'P' AS Visitor_Name) AS a
INNER JOIN
(SELECT DATE('2018-01-01') AS dt, 'A' AS Visitor_Name
UNION ALL
SELECT '2018-01-01' AS dt, 'B' AS Visitor_Name
UNION ALL
SELECT '2018-01-01' AS dt, 'C' AS Visitor_Name
UNION ALL
SELECT '2018-01-01' AS dt, 'D' AS Visitor_Name
UNION ALL
SELECT '2018-01-02' AS dt, 'B' AS Visitor_Name
UNION ALL
SELECT '2018-01-02' AS dt, 'C' AS Visitor_Name
UNION ALL
SELECT '2018-01-02' AS dt, 'E' AS Visitor_Name
UNION ALL
SELECT '2018-01-03' AS dt, 'A' AS Visitor_Name
UNION ALL
SELECT '2018-01-03' AS dt, 'P' AS Visitor_Name
UNION ALL
SELECT '2018-01-04' AS dt, 'A' AS Visitor_Name
UNION ALL
SELECT '2018-01-04' AS dt, 'C' AS Visitor_Name
UNION ALL
SELECT '2018-01-05' AS dt, 'D' AS Visitor_Name
UNION ALL
SELECT '2018-01-05' AS dt, 'B' AS Visitor_Name
UNION ALL
SELECT '2018-01-05' AS dt, 'B' AS Visitor_Name
UNION ALL
SELECT '2018-01-06' AS dt, 'P' AS Visitor_Name) AS b
ON (a.dt <= b.dt)
GROUP BY Date_range
ORDER BY Date_range;
解决方案
您可以通过在聚合之前“乘以”记录来完成此操作。也就是说,给每个用户一个用户应该计算的每个日期的记录。
这是一个例子:
with t as (
SELECT DATE('2018-01-01') AS dt, 'A' AS Visitor_Name
UNION ALL
SELECT '2018-01-01' AS dt, 'B' AS Visitor_Name
UNION ALL
SELECT '2018-01-01' AS dt, 'C' AS Visitor_Name
UNION ALL
SELECT '2018-01-01' AS dt, 'D' AS Visitor_Name
UNION ALL
SELECT '2018-01-02' AS dt, 'B' AS Visitor_Name
UNION ALL
SELECT '2018-01-02' AS dt, 'C' AS Visitor_Name
UNION ALL
SELECT '2018-01-02' AS dt, 'E' AS Visitor_Name
UNION ALL
SELECT '2018-01-03' AS dt, 'A' AS Visitor_Name
UNION ALL
SELECT '2018-01-03' AS dt, 'P' AS Visitor_Name
UNION ALL
SELECT '2018-01-04' AS dt, 'A' AS Visitor_Name
UNION ALL
SELECT '2018-01-04' AS dt, 'C' AS Visitor_Name
UNION ALL
SELECT '2018-01-05' AS dt, 'D' AS Visitor_Name
UNION ALL
SELECT '2018-01-05' AS dt, 'B' AS Visitor_Name
UNION ALL
SELECT '2018-01-05' AS dt, 'B' AS Visitor_Name
UNION ALL
SELECT '2018-01-06' AS dt, 'P' AS Visitor_Name
)
select dt, count(distinct visitor_name) as num_visitors
from (select distinct date_add(dt, interval inc day) as dt, visitor_name
from t CROSS JOIN
(select 0 as inc UNION ALL
SELECT 1
) x
) t
group by t.dt
order by t.dt;
推荐阅读
- powerbi - 如何过滤前一个过滤器返回的数据
- shiny - 在 Shinymanager 登录页面中更改文本和颜色
- android - 如何解决“任务执行失败':flutter_keyboard_visibility:compileDebugJavaWithJavac'”错误?
- javascript - Angular 材质自动完成与 ngb modal 相结合
- javascript - Javascript 外部文件无法正常工作
- pandas - 根据熊猫中的特定条件合并两个数据框
- oracle - SQL*PLUS 触发器尚未编译但未显示错误
- azure - 使用 REST API 在管道末尾获取当前/正在运行的构建最终阶段结果作为发布作业/任务
- amazon-web-services - AWS CVS 数据管道
- python - 如何用for请求一个数组来http获取参数?