sql - 如何在没有手动硬编码的情况下在 bigquery 标准 SQL 中透视数据?
问题描述
我有一个如下表:
| user_id | product_purchased |
-------------------------------
| 111 | A |
| 111 | B |
| 222 | B |
| 222 | B |
| 333 | C |
| 444 | A |
我想旋转表格以将用户 ID 作为行,并将用户购买的每个产品的计数作为列。所以对于上表,这看起来像:
| user_id | product A | product B | product C |
-----------------------------------------------
| 111 | 1 | 1 | 0 |
| 222 | 0 | 2 | 0 |
| 333 | 0 | 0 | 1 |
| 444 | 1 | 0 | 0 |
我知道这可以使用 countif 语句手动完成:
#standardsql
select user_id,
countif(product_purchased = 'A') as 'A',
countif(product_purchased = 'B') as 'B',
etc,
group by user_id
然而,实际上该表有太多可能的产品,无法手动写出所有选项。有没有办法以更自动化和优雅的方式进行这种旋转?
解决方案
实际上,该表有太多可能的产品,无法手动写出所有选项
以下是 BigQuery 标准 SQL
您可以分两步执行此操作 - 首先通过以下运行准备动态透视查询
#standardSQL
SELECT CONCAT('SELECT user_id, ',
STRING_AGG(
CONCAT('COUNTIF(product_purchased = "', product_purchased, '") AS product_', product_purchased)
),
' FROM `project.dataset.your_table` GROUP BY user_id')
FROM (
SELECT product_purchased
FROM `project.dataset.your_table`
GROUP BY product_purchased
)
结果,您将获得表示您需要运行以获得所需结果的查询的字符串
例如,如果适用于您问题中的虚拟数据
#standardSQL
WITH `project.dataset.your_table` AS (
SELECT 111 user_id, 'A' product_purchased UNION ALL
SELECT 111, 'B' UNION ALL
SELECT 222, 'B' UNION ALL
SELECT 222, 'B' UNION ALL
SELECT 333, 'C' UNION ALL
SELECT 444, 'A'
)
SELECT CONCAT('SELECT user_id, ',
STRING_AGG(
CONCAT('COUNTIF(product_purchased = "', product_purchased, '") AS product_', product_purchased)
),
' FROM `project.dataset.your_table` GROUP BY user_id')
FROM (
SELECT product_purchased
FROM `project.dataset.your_table`
GROUP BY product_purchased
)
你会得到下面的查询(格式化以便在此处更好地查看)
SELECT
user_id,
COUNTIF(product_purchased = "A") AS product_A,
COUNTIF(product_purchased = "B") AS product_B,
COUNTIF(product_purchased = "C") AS product_C
FROM `project.dataset.your_table`
GROUP BY user_id
现在,您只需运行它即可获得所需的结果,而无需手动编码
同样,如果要针对您问题中的虚拟数据运行它
#standardSQL
WITH `project.dataset.your_table` AS (
SELECT 111 user_id, 'A' product_purchased UNION ALL
SELECT 111, 'B' UNION ALL
SELECT 222, 'B' UNION ALL
SELECT 222, 'B' UNION ALL
SELECT 333, 'C' UNION ALL
SELECT 444, 'A'
)
SELECT
user_id,
COUNTIF(product_purchased = "A") AS product_A,
COUNTIF(product_purchased = "B") AS product_B,
COUNTIF(product_purchased = "C") AS product_C
FROM `project.dataset.your_table`
GROUP BY user_id
-- ORDER BY user_id
你得到预期的结果
Row user_id product_A product_B product_C
1 111 1 1 0
2 222 0 2 0
3 333 0 0 1
4 444 1 0 0
有没有办法以更自动化和优雅的方式进行这种旋转?
client
您可以使用您的任何选择 轻松实现上述自动化