首页 > 解决方案 > 复杂的 SQL 查询,无处可答 :(

问题描述

假设我在 SQL 数据库事件中具有属性dateuser_id. 我在表中有 10 条记录event

1.  user_id=1, date=2018.04.10
2.  user_id=1, date=2018.04.11
3.  user_id=1, date=2018.04.13
4.  user_id=1, date=2018.04.17
5.  user_id=1, date=2018.04.18
6.  user_id=2, date=2018.04.12
7.  user_id=2, date=2018.04.12
8.  user_id=2, date=2018.04.13
9.  user_id=2, date=2018.04.15
10. user_id=2, date=2018.04.16

是否可以使用标准 SQL 语法编写查询,只显示每个用户的记录,该日期至少相差 2 天。所以:

1.  user_id=1, date=2018.04.10   will be in result
2.  user_id=1, date=2018.04.11   not in result, only 1 day difference
3.  user_id=1, date=2018.04.12   will be in result, 2 days dif from record nbr 1.
4.  user_id=1, date=2018.04.17   will be in result, 5 days dif from record nbr 3.
5.  user_id=1, date=2018.04.18   no in result, only 1 day dif from record nbr4.
6.  user_id=2, date=2018.04.12   will be in result
7.  user_id=2, date=2018.04.12   not in result, 0 day difference
8.  user_id=2, date=2018.04.13   not in result, only 1 day dif from record nbr. 6.
9.  user_id=2, date=2018.04.15   will be in result, 2 days dif from record nbr 8.
10. user_id=2, date=2018.04.16   not in result, only 1 day dif from record nbr. 9.

请帮助大家,我办公室里没有人能帮我解决这个问题:(我将在 Google BigQuery 中使用这个查询

标签: sqlgoogle-bigquery

解决方案


只需使用lag()

select e.*
from (select e.*,
             lag(prev_date) over (partition by user_id order by date) as prev_date
      from events e
     ) e
where prev_date is null or 
      date > date_add(prev_date, interval 2 day);

注意:这对于绑定的日期可能有点棘手。如果您在其中添加第二列会更好,order by以便排序稳定。

如果您只有这两列,那么您可以在最低级别执行 distinct 来解决此问题:

select e.*
from (select e.*,
             lag(prev_date) over (partition by user_id order by date) as prev_date
      from (select distinct e.* from events e) e
     ) e
where prev_date is null or 
      date > date_add(prev_date, interval 2 day);

推荐阅读