sql - 如果缺少某个日期,则结转一行 | 大查询
问题描述
我有这种方式的数据
Column1 Column2 Date
d1 b1 1/1/2020
d1 b2 1/1/2020
d1 b3 1/1/2020
d2 b2 1/2/2020
d1 b3 1/2/2020
d1 b4 1/2/2020
d1 b1 1/3/2020
d2 b3 1/3/2020
d2 b3 1/4/2020
上面数据的grain是Col1, Col2 对于所有这些grain,如果grain级别的数据第二天没有来,我需要从前一天开始重复。这意味着如果我今天运行查询并且今天是 2020 年 1 月 4 日,则预期输出如下
Column1 Column2 Date Status
d1 b1 1/1/2020 FromInput
d1 b1 1/2/2020 Repeated
d1 b1 1/3/2020 FromInput
d1 b1 1/4/2020 FromInput
d1 b2 1/1/2020 FromInput
d1 b2 1/2/2020 Repeated
d1 b2 1/3/2020 Repeated
d1 b2 1/4/2020 Repeated
d1 b3 1/1/2020 FromInput
d1 b3 1/2/2020 FromInput
d1 b3 1/3/2020 Repeated
d1 b3 1/4/2020 Repeated
d1 b4 1/2/2020 FromInput
d1 b4 1/3/2020 Repeated
d1 b4 1/4/2020 Repeated
d2 b2 1/2/2020 FromInput
d2 b2 1/3/2020 Repeated
d2 b2 1/4/2020 Repeated
d2 b3 1/3/2020 FromInput
d2 b3 1/4/2020 FromInput
解决方案
以下是 BigQuery 标准 SQL
#standardSQL
WITH days AS (
SELECT column1, column2, day FROM (
SELECT column1, column2,
MIN(PARSE_DATE('%m/%d/%Y', day)) min_day,
PARSE_DATE('%m/%d/%Y', '1/4/2020') max_day -- or just CURRENT_DATE() in real use case
FROM `project.dataset.table`
GROUP BY column1, column2
),
UNNEST(GENERATE_DATE_ARRAY(min_day, max_day)) day
)
SELECT d.column1, d.column2, d.day,
IFNULL(t.value,
LAST_VALUE(value IGNORE NULLS)
OVER(PARTITION BY d.column1, d.column2 ORDER BY d.day ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
) value,
IF(t.value IS NULL, 'Repeated', 'From Input') Status
FROM days d
LEFT JOIN `project.dataset.table` t
ON d.column1 = t.column1
AND d.column2 = t.column2
AND d.day = PARSE_DATE('%m/%d/%Y', t.day)
-- ORDER BY column1, column2, day
您可以使用您问题中的示例数据进行测试,使用上面的示例数据,如下例所示
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'd1' column1, 'b1' column2, '1/1/2020' day, 1 value UNION ALL
SELECT 'd1', 'b2', '1/1/2020', 3 UNION ALL
SELECT 'd1', 'b3', '1/1/2020', 4 UNION ALL
SELECT 'd2', 'b2', '1/2/2020', 7 UNION ALL
SELECT 'd1', 'b3', '1/2/2020', 5 UNION ALL
SELECT 'd1', 'b4', '1/2/2020', 6 UNION ALL
SELECT 'd1', 'b1', '1/3/2020', 2 UNION ALL
SELECT 'd2', 'b3', '1/3/2020', 8 UNION ALL
SELECT 'd2', 'b3', '1/4/2020', 9
), days AS (
SELECT column1, column2, day FROM (
SELECT column1, column2,
MIN(PARSE_DATE('%m/%d/%Y', day)) min_day,
PARSE_DATE('%m/%d/%Y', '1/4/2020') max_day -- or just CURRENT_DATE() in real use case
FROM `project.dataset.table`
GROUP BY column1, column2
),
UNNEST(GENERATE_DATE_ARRAY(min_day, max_day)) day
)
SELECT d.column1, d.column2, d.day,
IFNULL(t.value,
LAST_VALUE(value IGNORE NULLS)
OVER(PARTITION BY d.column1, d.column2 ORDER BY d.day ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
) value,
IF(t.value IS NULL, 'Repeated', 'From Input') Status
FROM days d
LEFT JOIN `project.dataset.table` t
ON d.column1 = t.column1
AND d.column2 = t.column2
AND d.day = PARSE_DATE('%m/%d/%Y', t.day)
-- ORDER BY column1, column2, day
带输出
Row column1 column2 day value Status
1 d1 b1 2020-01-01 1 From Input
2 d1 b1 2020-01-02 1 Repeated
3 d1 b1 2020-01-03 2 From Input
4 d1 b1 2020-01-04 2 Repeated
5 d1 b2 2020-01-01 3 From Input
6 d1 b2 2020-01-02 3 Repeated
7 d1 b2 2020-01-03 3 Repeated
8 d1 b2 2020-01-04 3 Repeated
9 d1 b3 2020-01-01 4 From Input
10 d1 b3 2020-01-02 5 From Input
11 d1 b3 2020-01-03 5 Repeated
12 d1 b3 2020-01-04 5 Repeated
13 d1 b4 2020-01-02 6 From Input
14 d1 b4 2020-01-03 6 Repeated
15 d1 b4 2020-01-04 6 Repeated
16 d2 b2 2020-01-02 7 From Input
17 d2 b2 2020-01-03 7 Repeated
18 d2 b2 2020-01-04 7 Repeated
19 d2 b3 2020-01-03 8 From Input
20 d2 b3 2020-01-04 9 From Input
推荐阅读
- angular - Angular 材质中的动态 cdkDragBoundary
- r - 设置元素名称 Rcpp 错误堆栈使用
- c# - 右键单击相同 TortoiseSVN 的特定文件夹时如何创建新的上下文菜单?
- sql - 在分析函数中过滤行 - Oracle
- maven - 从 servlet 暴露的本地 m2 工件
- javascript - 将组件作为道具传递的更好方法是什么?
- javascript - 从谷歌地图边界获取纬度/经度值
- android - 如何在没有 SSL 认证的情况下保护 api 调用免受数据包捕获?
- reactjs - 无法使用 Bootstrap 验证 React 中的数字
- c# - 从 ChatBot 的项目文件夹中读取 JSON 文件