google-bigquery - UNNEST 集合中的最后一个值
问题描述
我面临以下查询的问题:
SELECT
project.id as id,
(SELECT value FROM UNNEST(project.labels) WHERE key="key1") as key1,
(SELECT value FROM UNNEST(project.labels) WHERE key="key2") as key2,
ROUND(SUM(cost), 2) as charges
FROM `cloud.billing.data_123`
WHERE project.id is not null and EXTRACT(MONTH FROM usage_start_time) = 6 and EXTRACT(YEAR FROM usage_start_time) = 2020
GROUP BY id, key1, key2
ORDER by id
它正在获取每个项目每月的总浪费(在上面的示例中,为 2020 年的第 6 个月)。此报告基于导出到bigquery的计费报告。结果是这样的:
Row | id | key1 | key2 | charges |
1 |project1 | null | null | 32 |
2 |project1 | x | y | 40 |
3 |project2 | null | null | 50 |
4 |project2 | x | y | 10 |
key1
键是项目标签,这是因为标签key2
刚刚在月中添加到项目中。因此,第一条记录(键上有空值)是项目没有标签时的总数,第二条记录(带有 x 和 y)是项目有标签时的总数。
有没有办法用标签将所有内容收集在一行中并对值求和,例如:
Row | id | key1 | key2 | charges |
1 |project1 | x | y | 72 |
2 |project2 | x | y | 60 |
提前致谢。
解决方案
我的理解是,您希望将每个项目的成本和输出id
、和相加key1
,并且key1和key2不为空。key2
cost
因此,为了实现这一点,我将提出两种方法,我假设每个项目只有一个 uniquekey1
和一个 unique key2
。换句话说,例如,当project1key1
为null时,它应该是x。
第一种方法:使用FIRST_VALUE()填写key1
和key2
值,当它们为空时。
WITH data1 AS (
SELECT
project.id as id,
(SELECT value FROM UNNEST(project.labels) WHERE key="key1") as key1,
(SELECT value FROM UNNEST(project.labels) WHERE key="key2") as key2,
cost
FROM `cloud.billing.data_123`
WHERE project.id is not null and EXTRACT(MONTH FROM usage_start_time) = 6 and EXTRACT(YEAR FROM usage_start_time) = 2020
GROUP BY id, project, ar, activity
ORDER by id, project
),
data2 AS(
SELECT id,
FIRST_VALUE(key1 IGNORE NULLS) OVER (PARTITION BY id ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS key1,
FIRST_VALUE(key2 IGNORE NULLS) OVER (PARTITION BY id ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS key2
cost
)
SELECT id, key1,key2, ROUND(SUM(cost),2) AS charges FROM data2
GROUP BY id, key1,key2
请注意,FIRST_VALUE()与 IGNORE NULLS 一起使用,它查找指定分区key1
的下一个可用值。key2
因此,可以对按 id、key1 和 key2 分组的成本求和。
第二种方法:使用SELECT DISTINCT和LEFT JOIN
WITH data1 AS (
SELECT
project.id as id,
(SELECT value FROM UNNEST(project.labels) WHERE key="key1") as key1,
(SELECT value FROM UNNEST(project.labels) WHERE key="key2") as key2,
cost
FROM `cloud.billing.data_123`
WHERE project.id is not null and EXTRACT(MONTH FROM usage_start_time) = 6 and EXTRACT(YEAR FROM usage_start_time) = 2020
GROUP BY id, project, ar, activity
ORDER by id, project
),
data2 AS(
SELECT DISTINCT id, key1,key2 FROM data
WHERE key1 IS NOT NULL AND key2 IS NOT NULL
)
SELECT a.id,b.key1,b.key2,ROUND(SUM(cost),2) AS charges FROM data a LEFT JOIN data2 b ON a.id = b.id
GROUP BY 1,2,3
这个想法与第一种方法相同,替换空值( forkey1
和key2
)然后对每个项目的成本求和。
两者的输出,
Row | id | key1 | key2 | charges |
1 |project1 | x | y | 72 |
2 |project2 | x | y | 60 |
推荐阅读
- go - 由于某种原因,将值传递给通道正在阻塞线程
- mysql - 通过查询自动分配列
- c - 如何通过 FreeBSD 中的文件结构获取绝对路径?
- sql - 如何从多列中选择树
- ruby-on-rails - 如何在模型的两个深度关系中查询最终模型上的属性?
- php - 使用php比较提交和显示消息的复选框值
- android - Nestedscrollview/Scrollview 中的 Webview 缺少内容
- wordpress - 如何将转发器字段添加到个人资料表单 Ultimate 会员插件
- html - 创建一个运行 python http 服务器的 dockerfile 来显示一个 html 文件
- javascript - Fabricjs:显示横幅的预览