首页 > 解决方案 > 如何在没有手动硬编码的情况下在 bigquery 标准 SQL 中透视数据?

问题描述

我有一个如下表:

| user_id | product_purchased |
-------------------------------
|    111  |        A           |
|    111  |        B           |
|    222  |        B           |
|    222  |        B           |
|    333  |        C           |
|    444  |        A           |

我想旋转表格以将用户 ID 作为行,并将用户购买的每个产品的计数作为列。所以对于上表,这看起来像:

| user_id | product A | product B | product C |
-----------------------------------------------
|    111  |     1      |      1    |     0    |
|    222  |     0      |      2    |     0    |
|    333  |     0      |      0    |     1    |
|    444  |     1      |      0    |     0    |

我知道这可以使用 countif 语句手动完成:

#standardsql
select user_id,
       countif(product_purchased = 'A') as 'A',
       countif(product_purchased = 'B') as 'B',
       etc,
group by user_id

然而,实际上该表有太多可能的产品,无法手动写出所有选项。有没有办法以更自动化和优雅的方式进行这种旋转?

标签: sqlgoogle-bigquery

解决方案


实际上,该表有太多可能的产品,无法手动写出所有选项

以下是 BigQuery 标准 SQL

您可以分两步执行此操作 - 首先通过以下运行准备动态透视查询

#standardSQL
SELECT CONCAT('SELECT user_id, ', 
  STRING_AGG(
    CONCAT('COUNTIF(product_purchased = "', product_purchased, '") AS product_', product_purchased)
  ), 
  ' FROM `project.dataset.your_table` GROUP BY user_id')
FROM (
  SELECT product_purchased 
  FROM `project.dataset.your_table`  
  GROUP BY product_purchased
)

结果,您将获得表示您需要运行以获得所需结果的查询的字符串

例如,如果适用于您问题中的虚拟数据

#standardSQL
WITH `project.dataset.your_table` AS (
  SELECT 111 user_id, 'A' product_purchased UNION ALL
  SELECT 111, 'B' UNION ALL
  SELECT 222, 'B' UNION ALL
  SELECT 222, 'B' UNION ALL
  SELECT 333, 'C' UNION ALL
  SELECT 444, 'A' 
)
SELECT CONCAT('SELECT user_id, ', 
  STRING_AGG(
    CONCAT('COUNTIF(product_purchased = "', product_purchased, '") AS product_', product_purchased)
  ), 
  ' FROM `project.dataset.your_table` GROUP BY user_id')
FROM (
  SELECT product_purchased 
  FROM `project.dataset.your_table`  
  GROUP BY product_purchased
)

你会得到下面的查询(格式化以便在此处更好地查看)

SELECT
  user_id,
  COUNTIF(product_purchased = "A") AS product_A,
  COUNTIF(product_purchased = "B") AS product_B,
  COUNTIF(product_purchased = "C") AS product_C
FROM `project.dataset.your_table`
GROUP BY user_id

现在,您只需运行它即可获得所需的结果,而无需手动编码

同样,如果要针对您问题中的虚拟数据运行它

#standardSQL
WITH `project.dataset.your_table` AS (
  SELECT 111 user_id, 'A' product_purchased UNION ALL
  SELECT 111, 'B' UNION ALL
  SELECT 222, 'B' UNION ALL
  SELECT 222, 'B' UNION ALL
  SELECT 333, 'C' UNION ALL
  SELECT 444, 'A' 
)
SELECT
  user_id,
  COUNTIF(product_purchased = "A") AS product_A,
  COUNTIF(product_purchased = "B") AS product_B,
  COUNTIF(product_purchased = "C") AS product_C
FROM `project.dataset.your_table`
GROUP BY user_id   
-- ORDER BY user_id

你得到预期的结果

Row user_id product_A   product_B   product_C    
1   111     1           1           0    
2   222     0           2           0    
3   333     0           0           1    
4   444     1           0           0    

有没有办法以更自动化和优雅的方式进行这种旋转?

client您可以使用您的任何选择 轻松实现上述自动化


推荐阅读