sql - 如何在 BigQuery 中连续四个星期循环过去六个月的数据
问题描述
我在大查询中有表,例如如下所示。(按日期分区)。我必须为下面详细提到的这个问题编写标准的 sql 查询。
student_id date duration(in hours)
1 2020-05-10 7
2 2020-05-10 8
3 2020-05-10 8
1 2020-05-11 8
2 2020-05-11 7
3 2020-05-12 6
这是我们几乎每天都在添加数据的表格,因此数据将增长得非常快。我必须找到在过去六个月中连续 4 周出勤时间超过 7 小时的学生 ID(每天检查,最近几个月本周将增加 1 周。)并将学生类型转换为好学生。例如在编程语言中。
for(start week->1 - end_week-> 4 till last six months):
if duration >=7 for date
boolean true
start_week = 2 //start week is incremented by 1 week for next loop
end_week = 5
对于任何学生,如果过去六个月的数据持续时间连续 4 周超过 7 小时,那么他就是好学生。这对我来说看起来非常具有挑战性,因为我在 bigquery 和 mysql 方面处于平均水平。我不知道如何实现这一点。
解决方案
这是您的用例示例
# Only for initiate the test with your data
with sample as (
select 1 as ID, DATE("2020-05-10") as d, 7 as hour
union all
select 2 as ID, DATE("2020-05-10") as d, 8 as hour
union all
select 3 as ID, DATE("2020-05-10") as d, 8 as hour
union all
select 1 as ID, DATE("2020-05-11") as d, 8 as hour
union all
select 2 as ID, DATE("2020-05-11") as d, 7 as hour
union all
select 3 as ID, DATE("2020-05-12") as d, 6 as hour
),
# Create an array of date to take into account the missing days (important for the sum over the 28 previous days)
date_array as (
select dd from UNNEST(GENERATE_DATE_ARRAY('2020-05-10', '2020-05-15', INTERVAL 1 DAY)) dd
),
# Product of existing IDs and possible date on the range
data_grid as (
select distinct ID, dd from sample, date_array
),
# Perform a right outer join to add missing date to the logs that you have in your sample data
merged_data as (
select data_grid.ID,d,hour,dd from sample RIGHT outer join data_grid on sample.d = data_grid.dd and sample.ID = data_grid.ID
)
# Sum per ID the 27 previous day in sliding windows (every day, the day and the last 27 are added)
select ID,dd, SUM(hour)
OVER (
PARTITION BY ID
ORDER BY dd
ROWS BETWEEN 27 PRECEDING AND CURRENT ROW
) AS total_purchases
from merged_data
推荐阅读
- java - 如何避免在使用 getScaledInstance 时键入后获得 Null Pointer Deference?
- azure - 自 2020 年起通过 Azure Web App 对上传到 Azure 块存储的文件进行病毒扫描
- python - 添加列跳过nan
- python - 如何从不和谐机器人的帮助命令中修改或删除“无类别”
- module - 覆盖 Prestashop 1.7 模块的方法
- python-3.x - 没有分隔符的文本文件中的数据通过python进入列表
- python - 使用布尔输出对 df 进行更短的过滤
- object-detection - 如何在 Tensorflow 2.x 中打印准确度和其他指标?
- mysql - 如何在 SQL 的日期字段中插入日期
- php - 将变量从一个函数传递到同一控制器中的另一个函数