首页 > 解决方案 > BigQuery - 在多个组中为多个列计算 0 - 100 个百分位数

问题描述

我们有一个如下所示的 bigquery 表:

with
  my_data as (
    select 1 as num1, 32 as num2, 43 as num3, 'a' as letter union all
    select 2 as num1, 21 as num2, 45 as num3, 'a' as letter union all
    select 3 as num1, 99 as num2, 47 as num3, 'a' as letter union all
    select 4 as num1, 83 as num2, 48 as num3, 'a' as letter union all
    select 5 as num1, 55 as num2, 49 as num3, 'a' as letter union all
    select 6 as num1, 35 as num2, 51 as num3, 'b' as letter union all
    select 7 as num1, 94 as num2, 52 as num3, 'b' as letter union all
    select 8 as num1, 17 as num2, 55 as num3, 'b' as letter union all
    select 9 as num1, 33 as num2, 56 as num3, 'b' as letter union all
    select 10 as num1, 81 as num2, 37 as num3, 'b' as letter union all
    select 11 as num1, 42 as num2, 38 as num3, 'a' as letter union all
    select 12 as num1, 26 as num2, 39 as num3, 'a' as letter union all
    select 13 as num1, 92 as num2, 41 as num3, 'a' as letter union all
    select 14 as num1, 38 as num2, 43 as num3, 'a' as letter union all
    select 15 as num1, 31 as num2, 46 as num3, 'a' as letter union all
    select 16 as num1, 53 as num2, 48 as num3, 'b' as letter union all
    select 17 as num1, 49 as num2, 49 as num3, 'b' as letter union all
    select 18 as num1, 71 as num2, 51 as num3, 'b' as letter union all
    select 19 as num1, 67 as num2, 52 as num3, 'b' as letter union all
    select 20 as num1, 62 as num2, 54 as num3, 'b' as letter
  )

letter是要分组的num1, num2, num3列,是我们要计算 0 - 100 %iles 的 3 列。更清楚地说,我们想返回一个包含 202 行和列的表letter pctile value1 value2 value3lettera(101 次) 和b(101) 次,pctile从 开始0,1,2,3... 100,0,1,2,3... 100,并且value1 value2 value3是对应于第 0、第 1、第 2、第 3、第 4 等百分位数的值(对于每个组/字母)。

我之前在这里发布了这个非常相似的问题 -在 BigQuery 中按组计算百分位数- 其中提供了一个有用的解决方案。但是,此解决方案适用于仅针对单个列计算 0 - 100 %ile 行的基本情况。现在,在我们数据的真实示例中,我们正在处理多个列。上一篇文章中的解决方案,当扩展到我们的 3 列新数据时,不起作用。

SELECT letter, pctile, value1, value2, value3
FROM (
  SELECT
    letter,
    APPROX_QUANTILES(num1, 100) AS value1,
    APPROX_QUANTILES(num2, 100) AS value2,
    APPROX_QUANTILES(num3, 100) AS value3,
  FROM my_data
  GROUP BY letter
) as t, 
t.value1 WITH OFFSET AS pctile

这在技术上确实返回 202 行,但是每行中的值value2不是value3单独的值,而是似乎是长度 == 100 的整个数组。我尝试了不同的方法来获得所需的结果(202 行,每行具有正确的个体值value1 value2 value3),但没有成功。这可能吗?

标签: google-bigquerypercentile

解决方案


试试下面

SELECT letter, pctile, value1, value2, value3
FROM (
  SELECT
    letter,
    APPROX_QUANTILES(num1, 100) AS value1,
    APPROX_QUANTILES(num2, 100) AS value2,
    APPROX_QUANTILES(num3, 100) AS value3,
  FROM my_data
  GROUP BY letter
) as t
,t.value1 WITH OFFSET AS pctile
,t.value2 WITH OFFSET AS pctile2
,t.value3 WITH OFFSET AS pctile3
WHERE pctile = pctile2
AND pctile = pctile3

推荐阅读