首页 > 解决方案 > UNNEST 集合中的最后一个值

问题描述

我面临以下查询的问题:

SELECT
   project.id as id,
   (SELECT value FROM UNNEST(project.labels) WHERE key="key1") as key1,
   (SELECT value FROM UNNEST(project.labels) WHERE key="key2") as key2,
   ROUND(SUM(cost), 2) as charges
FROM `cloud.billing.data_123`
WHERE project.id is not null and EXTRACT(MONTH FROM usage_start_time) = 6 and EXTRACT(YEAR FROM usage_start_time) = 2020
GROUP BY id, key1, key2
ORDER by id

它正在获取每个项目每月的总浪费(在上面的示例中,为 2020 年的第 6 个月)。此报告基于导出到bigquery的计费报告。结果是这样的:

Row | id       | key1 | key2 | charges |
1   |project1  | null | null | 32      | 
2   |project1  | x    | y    | 40      |
3   |project2  | null | null | 50      | 
4   |project2  | x    | y    | 10      |

key1键是项目标签,这是因为标签key2刚刚在月中添加到项目中。因此,第一条记录(键上有空值)是项目没有标签时的总数,第二条记录(带有 x 和 y)是项目有标签时的总数。

有没有办法用标签将所有内容收集在一行中并对值求和,例如:

Row | id       | key1 | key2 | charges |
1   |project1  | x    | y    | 72      |
2   |project2  | x    | y    | 60      |

提前致谢。

标签: google-bigquerygcloud

解决方案


我的理解是,您希望将每个项目的成本和输出id、和相加key1,并且key1key2不为空。key2cost

因此,为了实现这一点,我将提出两种方法,我假设每个项目只有一个 uniquekey1和一个 unique key2。换句话说,例如,当project1key1null时,它应该x

第一种方法:使用FIRST_VALUE()填写key1key2值,当它们为空时。

WITH data1 AS (
SELECT
   project.id as id,
   (SELECT value FROM UNNEST(project.labels) WHERE key="key1") as key1,
   (SELECT value FROM UNNEST(project.labels) WHERE key="key2") as key2,
   cost
FROM `cloud.billing.data_123`
WHERE project.id is not null and EXTRACT(MONTH FROM usage_start_time) = 6 and EXTRACT(YEAR FROM usage_start_time) = 2020
GROUP BY id, project, ar, activity
ORDER by id, project
),
data2 AS(
SELECT id, 
FIRST_VALUE(key1 IGNORE NULLS) OVER (PARTITION BY id ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS key1,
FIRST_VALUE(key2 IGNORE NULLS) OVER (PARTITION BY id ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS key2
cost
)
SELECT id, key1,key2, ROUND(SUM(cost),2) AS charges FROM data2
GROUP BY id, key1,key2

请注意,FIRST_VALUE()与 IGNORE NULLS 一起使用,它查找指定分区key1的下一个可用值。key2因此,可以对按 id、key1 和 key2 分组的成本求和。

第二种方法:使用SELECT DISTINCTLEFT JOIN

WITH data1 AS (
    SELECT
       project.id as id,
       (SELECT value FROM UNNEST(project.labels) WHERE key="key1") as key1,
       (SELECT value FROM UNNEST(project.labels) WHERE key="key2") as key2,
       cost
    FROM `cloud.billing.data_123`
    WHERE project.id is not null and EXTRACT(MONTH FROM usage_start_time) = 6 and EXTRACT(YEAR FROM usage_start_time) = 2020
    GROUP BY id, project, ar, activity
    ORDER by id, project
    ),
    data2 AS(
    SELECT DISTINCT id, key1,key2 FROM data
    WHERE key1 IS NOT NULL AND key2 IS NOT NULL
    )
    SELECT a.id,b.key1,b.key2,ROUND(SUM(cost),2) AS charges FROM data a LEFT JOIN data2 b ON a.id = b.id
    GROUP BY 1,2,3

这个想法与第一种方法相同,替换空值( forkey1key2)然后对每个项目的成本求和。

两者的输出,

Row | id       | key1 | key2 | charges |
1   |project1  | x    | y    | 72      |
2   |project2  | x    | y    | 60      |

推荐阅读