sql - BigQuery - 按组折叠行
问题描述
在 BigQuery 中,我有一个包含来自不同数据源的各种项目的表。在各个列之间执行记录链接,结果存储在前面显示的表中。我现在需要折叠结果,以便对于每一列,每个字段只有一个值。例如,我有下表:
具有以下值:
我正在寻找结果如下:
---------------------------------------
| Row | col_1 | col_2 | col_3 | col_4 |
---------------------------------------
| 1 | 4 | NULL | NULL | NULL |
---------------------------------------
| 2 | 0 | 1 | 2 | NULL |
---------------------------------------
| 3 | 5 | NULL | 6 | 7 |
---------------------------------------
在上面可以观察到第 2 行和第 3 行是折叠的,因为它们的值都是1
for col_2
。同样,第 4 行和第 5 行被折叠,因为它们都有一个6
in col_3
。我尝试了以下但我无法让它工作。有没有人有什么建议?
SELECT
FIRST_VALUE(c1.col_1) OVER (PARTITION BY c1.col_1 ORDER BY 1 ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS col_1,
FIRST_VALUE(c2.col_2 IGNORE NULLS) OVER (PARTITION BY c1.col_1 ORDER BY 1 ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS col_2,
FIRST_VALUE(c3.col_3 IGNORE NULLS) OVER (PARTITION BY c2.col_2 ORDER BY 1 ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS col_3,
FIRST_VALUE(c4.col_4 IGNORE NULLS) OVER (PARTITION BY c3.col_3 ORDER BY 1 ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS col_4,
FROM
dataset.collapse_test c1
LEFT JOIN
dataset.collapse_test c2
ON
c1.col_2 = c2.col_2
LEFT JOIN
dataset.collapse_test c3
ON
c2.col_2 = c3.col_2
LEFT JOIN
dataset.collapse_test c4
ON
c3.col_3 = c4.col_3
解决方案
此查询存在一些问题,因为正如@Gordon Linoff 在评论中所说,当多个值匹配时,某些行可能会出现问题。
因此,在这里我尝试生成您想要的值但也保留旧的值,因为在某些情况下很难理解何时保留它们,如前所述。
首先,我们创建表:
CREATE TABLE `dataset.collapse` (
col_1 INT64,
col_2 INT64,
col_3 INT64,
col_4 INT64,
) as SELECT 4 as col_1, null as col_2, null as col_3, null as col_4 UNION ALL
SELECT 0 as col_1, 1 as col_2, null as col_3, null as col_4 UNION ALL
SELECT null as col_1, 1 as col_2, 2 as col_3, null as col_4 UNION ALL
SELECT null as col_1, null as col_2, 6 as col_3, 7 as col_4 UNION ALL
SELECT 5 as col_1, null as col_2, 6 as col_3, null as col_4
然后我们针对这种情况运行我们的解决方案,我使用该IFNULL
函数仅保留非空值并INNER JOIN
作为等值条件:
SELECT
IF
(t1.col_1 IS NULL,
CAST(CONCAT(IFNULL(t1.col_1,
t2.col_1)) AS INT64),
t1.col_1) AS c1,
IF
(t1.col_2 IS NULL,
CAST(CONCAT(IFNULL(t1.col_2,
t2.col_2)) AS INT64),
t1.col_2) AS c2,
IF
(t1.col_3 IS NULL,
CAST(CONCAT(IFNULL(t1.col_3,
t2.col_3)) AS INT64),
t1.col_3) AS c3,
IF
(t1.col_4 IS NULL,
CAST(CONCAT(IFNULL(t1.col_4,
t2.col_4)) AS INT64),
t1.col_4) AS c4
FROM
`project.dataset.collapse` AS t1
INNER JOIN
`project.dataset.collapse` AS t2
ON
t1.col_2 = t2.col_2
OR t1.col_1 = t2.col_1
OR t1.col_3 = t2.col_3
OR t1.col_4 = t2.col_4
GROUP BY
c1,
c2,
c3,
c4
我们得到的解决方案是这样的:
为了删除旧值,解决方案类似于:
SELECT
*
FROM (previous_query)
WHERE
CONCAT(IFNULL(c1,
-1),IFNULL(c2,
-1),IFNULL(c3,
-1),IFNULL(c4,
-1))
NOT IN (
SELECT
CONCAT(IFNULL(col_1,
-1),IFNULL(col_2,
-1),IFNULL(col_3,
-1),IFNULL(col_4,
-1))
FROM
`project.dataset.collapse`
WHERE
CONCAT(IFNULL(col_1,
-1),IFNULL(col_2,
-1),IFNULL(col_3,
-1),IFNULL(col_4,
-1)) = CONCAT(IFNULL(c1,
-1),IFNULL(c2,
-1),IFNULL(c3,
-1),IFNULL(c4,
-1)) ))
请记住,此解决方案还将删除 [4 null null null]
行。
结果:
推荐阅读
- python - 如何在索引中应用 value_counts 并创建新的数据框?
- python - 如何使用 gmail 使用 django 发送电子邮件
- 3d - 如何在 rgl 设备中添加几个数据子集作为球体或点?添加=T 不工作
- c# - 如何在 webview 上显示网站的弹出窗口?
- python - 在 python 中为外语创建 wordcloud(希伯来语)
- cordova - 将对讲机与 Ionic 4 集成?
- raspberry-pi - Qbittorrent 作为嵌入式跟踪器
- python - 使用 Pandas Dataframe 在 Gensim LDA 中进行数据处理时出错
- swift - 无法关闭嵌入在导航控制器中的视图控制器
- c# - 在打开的页面之间传递参数的最佳实践