首页 > 解决方案 > 可能的最短查询,递归检查的大查询

问题描述

假设我在大查询中有一个数据,一个月有数百万行。例如:

|---------------------|------------------|
|      date           |     user         |
|---------------------|------------------|
|          01-12-2019 |   xyz            |
|---------------------|------------------|
|          02-12-2019 |   xyz            |
|---------------------|------------------|
|          03-12-2019 |   abc            |
|---------------------|------------------|

现在我想检索每日数据,接下来 14 天的重复用户计数,即第一次来 01-12-2019 的用户,然后是在接下来的 14 天内再次访问的重复用户计数(02- 2019 年 12 月 - 2019 年 15 月 12 日)。我想出了检索相同的方法,但对于某些特定日期,使用下面的查询。

 SELECT '2019-12-01' AS visit_date, COUNT(DISTINCT user) AS visitors_count
 FROM `user_data`
 WHERE
 date = '2019-12-01' AND user IN (SELECT user FROM `user_data`
 WHERE date between DATE_ADD('2019-12-01', INTERVAL 1 DAY) AND DATE_ADD('2019-12-01', INTERVAL 
 14 DAY) )
 GROUP BY 1

我可以使用的一种方法是 UNION ALL,这可能不是最好的解决方案,这就是为什么我愿意知道一些最佳实践,我必须养成这种情况的习惯。

标签: sqlgoogle-cloud-platformgoogle-bigquery

解决方案


您可以通过使用union all和聚合来解决这个问题。关键是保持日期的进出。所以:

with ud as (
      select user, date, 1 as inc
      from user_data
      union all
      select user, date_add(date, interval 15 day), -1 as inc
      from user_data
     )
select date,
       sum(inc) as change_on_day,
       sum(sum(inc)) over (order by date) as total_on_day
from ud
group by date
order by date;

编辑:

您可以修改上述内容以获得客户的第一个正数和最后一个

with ud as (
      select user, date, 1 as inc
      from (select ud.*,
                   lag(date) over (partition by user order by date) as prev_date
            from user_data ud
           ) ud
      where prev_date is null or prev_date < date_add(date, interval -14 day)
      union all
      select user, date_add(date, interval 15 day), -1 as inc
      from (select ud.*,
                   lead(date) over (partition by user order by date) as lead_date
            from user_data ud
           ) ud
      where next_date is null or next_date < date_add(date, interval 14 day)
     )
select date,
       sum(inc) as change_on_day,
       sum(sum(inc)) over (order by date) as total_on_day
from ud
group by date
order by date;

推荐阅读