首页 > 解决方案 > 如何获取相对于 BigQuery 中的上一行发生更改的列列表?

问题描述

为了更好地理解,请考虑下面显示的示例和附加的图像:

假设当 ID 相同时,BigQuery 中表中的所有列都不应该从第 1 天更改为第 2 天,我如何才能获得未按预期运行的列的列表。

换句话说,当 ID 相同时,我想列出从第 1 天到第 2 天已更改的所有列。

在示例的最后一列中,我展示了所需的输出:

| ID | Day | Column1 | Column2 | Column3 | Column4 | Column5 | Column6 | Column7 | Desired outputs
| 1  | 1   |    x    |    x    |    x    |    x    |    x    |    x    |    x    |     Column3
| 1  | 2   |    x    |    x    |    y    |    x    |    x    |    x    |    x    |     Column3
| 2  | 1   |    x    |    x    |    x    |    x    |    x    |    x    |    x    |     Column2
| 2  | 2   |    x    |    y    |    x    |    x    |    x    |    x    |    x    |     Column2
| 3  | 1   |    x    |    x    |    x    |    x    |    x    |    x    |    x    | Column4,Column6
| 3  | 2   |    x    |    x    |    x    |    y    |    x    |    y    |    x    | Column4,Column6
| 4  | 1   |    x    |    x    |    x    |    x    |    x    |    x    |    y    |     Column7
| 4  | 2   |    x    |    x    |    x    |    x    |    x    |    x    |    x    |     Column7

问候,布鲁诺

例子

标签: sqldatabasegoogle-bigqueryetlanalytics

解决方案


这是一个可能的解决方案(忽略 WITH 语句,这是为了重现您的表):

WITH sample AS (
SELECT 1 AS ID, 1 AS Day, "x" AS Column1, "x" AS Column2, "x" AS Column3, "x" AS Column4, "x" AS Column5, "x" AS Column6, "x" AS Column7
UNION ALL
SELECT 1, 2, "x", "x", "y", "x", "x", "x", "x"
UNION ALL
SELECT 2, 1, "x", "x", "x", "x", "x", "x", "x"
UNION ALL
SELECT 2, 2, "x", "y", "x", "x", "x", "x", "x"
UNION ALL
SELECT 3, 1, "x", "x", "x", "x", "x", "x", "x"
UNION ALL
SELECT 3, 2, "x", "x", "x", "y", "x", "y", "x"
UNION ALL
SELECT 4, 1, "x", "x", "x", "x", "x", "x", "y"
UNION ALL
SELECT 4, 2, "x", "x", "x", "x", "x", "x", "x"

)

SELECT ID,
    Day,
    STRING_AGG(column_name) AS modified_columns
FROM (SELECT ID, 
    value,
    Day,
    LAG(value) OVER(PARTITION BY ID, column_name ORDER BY Day) AS previous_value,
    column_name
FROM
(SELECT * FROM sample
UNPIVOT(value FOR column_name IN (Column1, Column2, Column3, Column4, Column5, Column6, Column7)))
)
WHERE value <> previous_value
GROUP BY ID, Day

流程详情:

  1. 取消透视表(全新的取消透视功能!)
SELECT * FROM sample
UNPIVOT(value FOR column_name IN (Column1, Column2, Column3, Column4, Column5, Column6, Column7))

在此处输入图像描述

  1. 计算每列和每个 ID 的前一天值:
SELECT ID, 
    value,
    Day,
    LAG(value) OVER(PARTITION BY ID, column_name ORDER BY Day) AS previous_value,
    column_name
FROM
(SELECT * FROM sample
UNPIVOT(value FOR column_name IN (Column1, Column2, Column3, Column4, Column5, Column6, Column7)))

在此处输入图像描述

  1. 为每个 ID 和 Day 聚合修改后的列名

整体查询为您提供

在此处输入图像描述

注意: unpivot 操作很容易概括,你应该能够得到你的列列表

SELECT COLUMN_NAME 
FROM yourdataset.INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = yourtablename
AND COLUMN_NAME NOT IN ("ID", "Day")

推荐阅读