首页 > 解决方案 > BigQuery:如何计算过去 2 天不同访问者的运行计数

问题描述

我想计算每个日期在过去 2 天内的唯一用户数。

第一个查询:-我尝试使用CASE语句给我当天的用户数,这不是预期的结果,即使我尝试使用窗口函数。

我知道一种通过自连接(已经提到作为第二个查询)的替代解决方案,它给了我期望的正确答案,但我想在一个查询中完成。

在单个查询中做的原因是希望减少处理的数据大小,如果我进行自连接它将读取完整的表两次,并且原始表大小是多 TB。

SELECT
(CASE WHEN dt BETWEEN DATE_SUB(dt, INTERVAL 1 DAY) AND dt THEN 
CONCAT(CAST(DATE_SUB(dt, INTERVAL 1 DAY) AS STRING), '::', CAST(dt AS STRING)) END) AS Date_range,
COUNT(DISTINCT (CASE WHEN dt BETWEEN DATE_SUB(dt, INTERVAL 1 DAY) AND dt THEN Visitor_Name END)) AS Visitor_Count
FROM
(SELECT DATE('2018-01-01') AS dt, 'A' AS Visitor_Name
UNION ALL
SELECT '2018-01-01' AS dt, 'B' AS Visitor_Name
UNION ALL
SELECT '2018-01-01' AS dt, 'C' AS Visitor_Name
UNION ALL
SELECT '2018-01-01' AS dt, 'D' AS Visitor_Name
UNION ALL
SELECT '2018-01-02' AS dt, 'B' AS Visitor_Name
UNION ALL
SELECT '2018-01-02' AS dt, 'C' AS Visitor_Name
UNION ALL
SELECT '2018-01-02' AS dt, 'E' AS Visitor_Name
UNION ALL
SELECT '2018-01-03' AS dt, 'A' AS Visitor_Name
UNION ALL
SELECT '2018-01-03' AS dt, 'P' AS Visitor_Name
UNION ALL
SELECT '2018-01-04' AS dt, 'A' AS Visitor_Name
UNION ALL
SELECT '2018-01-04' AS dt, 'C' AS Visitor_Name
UNION ALL
SELECT '2018-01-05' AS dt, 'D' AS Visitor_Name
UNION ALL
SELECT '2018-01-05' AS dt, 'B' AS Visitor_Name
UNION ALL
SELECT '2018-01-05' AS dt, 'B' AS Visitor_Name
SELECT '2018-01-06' AS dt, 'P' AS Visitor_Name)
GROUP BY Date_range
ORDER BY Date_range;

解决方案:

SELECT
    (CASE WHEN a.dt BETWEEN DATE_SUB(b.dt, INTERVAL 1 DAY) AND b.dt THEN 
    CONCAT(CAST(DATE_SUB(b.dt, INTERVAL 1 DAY) AS STRING), '::', CAST(b.dt AS STRING)) END) AS Date_range,
    COUNT(DISTINCT (CASE WHEN a.dt BETWEEN DATE_SUB(b.dt, INTERVAL 1 DAY) AND b.dt THEN a.Visitor_Name END)) AS Visitor_Count
FROM


    (SELECT DATE('2018-01-01') AS dt, 'A' AS Visitor_Name
    UNION ALL
    SELECT '2018-01-01' AS dt, 'B' AS Visitor_Name
    UNION ALL
    SELECT '2018-01-01' AS dt, 'C' AS Visitor_Name
    UNION ALL
    SELECT '2018-01-01' AS dt, 'D' AS Visitor_Name
    UNION ALL
    SELECT '2018-01-02' AS dt, 'B' AS Visitor_Name
    UNION ALL
    SELECT '2018-01-02' AS dt, 'C' AS Visitor_Name
    UNION ALL
    SELECT '2018-01-02' AS dt, 'E' AS Visitor_Name
    UNION ALL
    SELECT '2018-01-03' AS dt, 'A' AS Visitor_Name
    UNION ALL
    SELECT '2018-01-03' AS dt, 'P' AS Visitor_Name
    UNION ALL
    SELECT '2018-01-04' AS dt, 'A' AS Visitor_Name
    UNION ALL
    SELECT '2018-01-04' AS dt, 'C' AS Visitor_Name
    UNION ALL
    SELECT '2018-01-05' AS dt, 'D' AS Visitor_Name
    UNION ALL
    SELECT '2018-01-05' AS dt, 'B' AS Visitor_Name
    UNION ALL
    SELECT '2018-01-05' AS dt, 'B' AS Visitor_Name
    UNION ALL
    SELECT '2018-01-06' AS dt, 'P' AS Visitor_Name) AS a


    INNER JOIN


    (SELECT DATE('2018-01-01') AS dt, 'A' AS Visitor_Name
    UNION ALL
    SELECT '2018-01-01' AS dt, 'B' AS Visitor_Name
    UNION ALL
    SELECT '2018-01-01' AS dt, 'C' AS Visitor_Name
    UNION ALL
    SELECT '2018-01-01' AS dt, 'D' AS Visitor_Name
    UNION ALL
    SELECT '2018-01-02' AS dt, 'B' AS Visitor_Name
    UNION ALL
    SELECT '2018-01-02' AS dt, 'C' AS Visitor_Name
    UNION ALL
    SELECT '2018-01-02' AS dt, 'E' AS Visitor_Name
    UNION ALL
    SELECT '2018-01-03' AS dt, 'A' AS Visitor_Name
    UNION ALL
    SELECT '2018-01-03' AS dt, 'P' AS Visitor_Name
    UNION ALL
    SELECT '2018-01-04' AS dt, 'A' AS Visitor_Name
    UNION ALL
    SELECT '2018-01-04' AS dt, 'C' AS Visitor_Name
    UNION ALL
    SELECT '2018-01-05' AS dt, 'D' AS Visitor_Name
    UNION ALL
    SELECT '2018-01-05' AS dt, 'B' AS Visitor_Name
    UNION ALL
    SELECT '2018-01-05' AS dt, 'B' AS Visitor_Name
    UNION ALL
    SELECT '2018-01-06' AS dt, 'P' AS Visitor_Name) AS b

    ON (a.dt <= b.dt)
GROUP BY Date_range
ORDER BY Date_range;

标签: sqlgoogle-bigquery

解决方案


您可以通过在聚合之前“乘以”记录来完成此操作。也就是说,给每个用户一个用户应该计算的每个日期的记录。

这是一个例子:

with t as (
      SELECT DATE('2018-01-01') AS dt, 'A' AS Visitor_Name
      UNION ALL
      SELECT '2018-01-01' AS dt, 'B' AS Visitor_Name
      UNION ALL
      SELECT '2018-01-01' AS dt, 'C' AS Visitor_Name
      UNION ALL
      SELECT '2018-01-01' AS dt, 'D' AS Visitor_Name
      UNION ALL
      SELECT '2018-01-02' AS dt, 'B' AS Visitor_Name
      UNION ALL
      SELECT '2018-01-02' AS dt, 'C' AS Visitor_Name
      UNION ALL
      SELECT '2018-01-02' AS dt, 'E' AS Visitor_Name
      UNION ALL
      SELECT '2018-01-03' AS dt, 'A' AS Visitor_Name
      UNION ALL
      SELECT '2018-01-03' AS dt, 'P' AS Visitor_Name
      UNION ALL
      SELECT '2018-01-04' AS dt, 'A' AS Visitor_Name
      UNION ALL
      SELECT '2018-01-04' AS dt, 'C' AS Visitor_Name
      UNION ALL
      SELECT '2018-01-05' AS dt, 'D' AS Visitor_Name
      UNION ALL
      SELECT '2018-01-05' AS dt, 'B' AS Visitor_Name
      UNION ALL
      SELECT '2018-01-05' AS dt, 'B' AS Visitor_Name
      UNION ALL
      SELECT '2018-01-06' AS dt, 'P' AS Visitor_Name
     )
select dt, count(distinct visitor_name) as num_visitors
from (select distinct date_add(dt, interval inc day) as dt, visitor_name
      from t CROSS JOIN
           (select 0 as inc UNION ALL
            SELECT 1
           ) x
     ) t
group by t.dt
order by t.dt;

推荐阅读