首页 > 解决方案 > Bigquery标准中Group by rollup的分组功能

问题描述

我知道 bigquery 旧版 SQL 有一个 grouping() 函数用于 group by rollup 来区分空值。

https://cloud.google.com/bigquery/docs/reference/legacy-sql#example_7

标准 SQL 是否有类似的功能?或者有没有其他方法来区分空值而不首先用硬编码值替换它们?

标签: google-bigquery

解决方案


根据文档,GROUP BY ROLLUP()在 StandardSQL 中可用。它被定义为:

GROUP BY ROLLUP 为 ROLLUP 列表中的表达式的前缀返回 GROUP BY 的结果,每个表达式都称为一个分组集。对于ROLLUP列表(a,b,c),分组集是(a,b,c),(a,b),(a),()

您可以将其与如下语法一起使用:

SELECT * FROM `project_id.dataset.table`
GROUP BY ROLLUP (column)

此外,为了清楚地解释功能,我将使用下面的示例(取自文档):

WITH Sales AS (
  SELECT 123 AS sku, 1 AS day, 9.99 AS price UNION ALL
  SELECT 123, 1, 8.99 UNION ALL
  SELECT 456, 1, 4.56 UNION ALL
  SELECT 123, 2, 9.99 UNION ALL
  SELECT 789, 3, 1.00 UNION ALL
  SELECT 456, 3, 4.25 UNION ALL
  SELECT 789, 3, 0.99
)
SELECT
  sku,
  day,
  SUM(price) AS total
FROM Sales
GROUP BY ROLLUP(sku, day)
ORDER BY sku, day;

而输出,

+------+------+-------+
| sku  | day  | total |
+------+------+-------+
| NULL | NULL | 39.77 |
|  123 | NULL | 28.97 |
|  123 |    1 | 18.98 |
|  123 |    2 |  9.99 |
|  456 | NULL |  8.81 |
|  456 |    1 |  4.56 |
|  456 |    3 |  4.25 |
|  789 |    3 |  1.99 |
|  789 | NULL |  1.99 |
+------+------+-------+

输出可以分为 3 个部分来解释,

  1. sku为空且天为空时

它指的是所有值skuday值的总和。在此示例中,所有日期sku中所有价格的总和为39.77

  1. 什么时候sku不为空且day为空

days它是指一个特定的所有的总和sku。例如,当sku = 123所有天数的总和为28.97时。

  1. 当两者都不daysku空时。

它就像一个正常的 sum 和 group by。它显示特定值对的总和。

关于第二个问题,如果要更改空值,可以相应地更改每个空值或使用其他方法。下面我使用WITH以不同的方式输出相同的结果,

WITH Sales AS (
      SELECT 123 AS sku, 1 AS day, 9.99 AS price UNION ALL
      SELECT 123, 1, 8.99 UNION ALL
      SELECT 456, 1, 4.56 UNION ALL
      SELECT 123, 2, 9.99 UNION ALL
      SELECT 789, 3, 1.00 UNION ALL
      SELECT 456, 3, 4.25 UNION ALL
      SELECT 789, 3, 0.99
),
sum_all AS (
SELECT sum(price) as sum_all_days_sku from Sales
),
sum_per_sku AS (
SELECT sku, ROUND(sum(price),2) AS sum_per_sku, CONCAT("This sum refers to sku =", sku) AS comment FROM Sales GROUP BY sku
),
sum AS (
SELECT sku, day, sum(price) AS sum_per_sku_per_day FROM Sales GROUP BY sku, day
)
SELECT a.sku,a.day,a.sum_per_sku_per_day, b.sum_per_sku, c.sum_all_days_sku 
FROM sum a LEFT JOIN sum_per_sku b USING(sku) CROSS JOIN sum_all c

而输出,

Row sku day sum_per_sku_per_day sum_per_sku sum_all_days_sku
1   123 1   18.98               28.97       39.77
2   456 1   4.56                8.81        39.77
3   123 2   9.99                28.97       39.77
4   789 3   1.99                1.99        39.77
5   456 3   4.25                8.81        39.77

如图所示,输出与GROUP BY ROLLUP()具有相同的总和。但是,它显示在列中。

更新:

如果某些sku行为空,如果在使用 ROLLUP() 之前不处理空值,将会有点混乱。

使用示例数据和查询如下,

WITH Sales AS (
  SELECT 123 AS sku, 1 AS day, 9.99 AS price UNION ALL
  SELECT 123, 1, 8.99 UNION ALL
  SELECT 456, 1, 4.56 UNION ALL
  SELECT 123, 2, 9.99 UNION ALL
  SELECT 789, 3, 1.00 UNION ALL
  SELECT null, 3, 1 UNION ALL
  SELECT null, 3, 1 UNION ALL
  SELECT null, 3, 1 UNION ALL
  SELECT null, 3, 1 UNION ALL
  SELECT null, 3, 1
)
SELECT
  sku,
  day,
  SUM(price) AS total
FROM Sales
GROUP BY ROLLUP(sku, day)
ORDER BY sku, day;

请注意,有 5 行带有skuasNULL和参考总和为 5。输出,

Row sku  day  total
1   null null 39.53
2   null null 5
3   null 3    5
4   123  null 28.97
5   123  1    18.98
6   123  2    9.99
7   456  null 4.56
8   456  1    4.56
9   789  null 1
10  789  3    1
 

如上所示,第二个和第三个输出的行与空值相关,而不与ROLLUP(). 为了区分它,有必要在最终查询中使用它们之前处理这些值。特别是,我在为空的地方分配了零sku。然后查询数据,如下:

WITH Sales AS (
  SELECT 123 AS sku, 1 AS day, 9.99 AS price UNION ALL
  SELECT 123, 1, 8.99 UNION ALL
  SELECT 456, 1, 4.56 UNION ALL
  SELECT 123, 2, 9.99 UNION ALL
  SELECT 789, 3, 1.00 UNION ALL
  SELECT null, 3, 1 UNION ALL
  SELECT null, 3, 1 UNION ALL
  SELECT null, 3, 1 UNION ALL
  SELECT null, 3, 1 UNION ALL
  SELECT null, 3, 1
), 
data as (
SELECT IFNULL(sku , 0) as sku, day, price from Sales
)
SELECT
  sku,
  day,
  SUM(price) AS total
FROM data
GROUP BY ROLLUP(sku, day)
ORDER BY sku, day;

而输出,

Row sku  day  total
1   null null 39.53
2   0    null 5
3   0    3    5
4   123  null 28.97
5   123  1    18.98
6   123  2    9.99
7   456  null 4.56
8   456  1    4.56
9   789  null 1
10  789  3    1

请注意,现在它更容易理解,因为我们知道它0表示空数据。


推荐阅读