google-bigquery - BigQuery - 在多个组中为多个列计算 0 - 100 个百分位数
问题描述
我们有一个如下所示的 bigquery 表:
with
my_data as (
select 1 as num1, 32 as num2, 43 as num3, 'a' as letter union all
select 2 as num1, 21 as num2, 45 as num3, 'a' as letter union all
select 3 as num1, 99 as num2, 47 as num3, 'a' as letter union all
select 4 as num1, 83 as num2, 48 as num3, 'a' as letter union all
select 5 as num1, 55 as num2, 49 as num3, 'a' as letter union all
select 6 as num1, 35 as num2, 51 as num3, 'b' as letter union all
select 7 as num1, 94 as num2, 52 as num3, 'b' as letter union all
select 8 as num1, 17 as num2, 55 as num3, 'b' as letter union all
select 9 as num1, 33 as num2, 56 as num3, 'b' as letter union all
select 10 as num1, 81 as num2, 37 as num3, 'b' as letter union all
select 11 as num1, 42 as num2, 38 as num3, 'a' as letter union all
select 12 as num1, 26 as num2, 39 as num3, 'a' as letter union all
select 13 as num1, 92 as num2, 41 as num3, 'a' as letter union all
select 14 as num1, 38 as num2, 43 as num3, 'a' as letter union all
select 15 as num1, 31 as num2, 46 as num3, 'a' as letter union all
select 16 as num1, 53 as num2, 48 as num3, 'b' as letter union all
select 17 as num1, 49 as num2, 49 as num3, 'b' as letter union all
select 18 as num1, 71 as num2, 51 as num3, 'b' as letter union all
select 19 as num1, 67 as num2, 52 as num3, 'b' as letter union all
select 20 as num1, 62 as num2, 54 as num3, 'b' as letter
)
letter
是要分组的num1, num2, num3
列,是我们要计算 0 - 100 %iles 的 3 列。更清楚地说,我们想返回一个包含 202 行和列的表letter pctile value1 value2 value3
。letter
是a
(101 次) 和b
(101) 次,pctile
从 开始0,1,2,3... 100,0,1,2,3... 100
,并且value1 value2 value3
是对应于第 0、第 1、第 2、第 3、第 4 等百分位数的值(对于每个组/字母)。
我之前在这里发布了这个非常相似的问题 -在 BigQuery 中按组计算百分位数- 其中提供了一个有用的解决方案。但是,此解决方案适用于仅针对单个列计算 0 - 100 %ile 行的基本情况。现在,在我们数据的真实示例中,我们正在处理多个列。上一篇文章中的解决方案,当扩展到我们的 3 列新数据时,不起作用。
SELECT letter, pctile, value1, value2, value3
FROM (
SELECT
letter,
APPROX_QUANTILES(num1, 100) AS value1,
APPROX_QUANTILES(num2, 100) AS value2,
APPROX_QUANTILES(num3, 100) AS value3,
FROM my_data
GROUP BY letter
) as t,
t.value1 WITH OFFSET AS pctile
这在技术上确实返回 202 行,但是每行中的值value2
不是value3
单独的值,而是似乎是长度 == 100 的整个数组。我尝试了不同的方法来获得所需的结果(202 行,每行具有正确的个体值value1 value2 value3
),但没有成功。这可能吗?
解决方案
试试下面
SELECT letter, pctile, value1, value2, value3
FROM (
SELECT
letter,
APPROX_QUANTILES(num1, 100) AS value1,
APPROX_QUANTILES(num2, 100) AS value2,
APPROX_QUANTILES(num3, 100) AS value3,
FROM my_data
GROUP BY letter
) as t
,t.value1 WITH OFFSET AS pctile
,t.value2 WITH OFFSET AS pctile2
,t.value3 WITH OFFSET AS pctile3
WHERE pctile = pctile2
AND pctile = pctile3
推荐阅读
- python-3.x - 将 DataFrame 列与其他 DataFrame 映射
- c# - 验证无效的 json 输入以避免外键约束错误
- r - 互联网被拒地点的 iPad 上的 R Shiny 应用程序
- javascript - 如何调整“工具提示”的位置,使其位于我的旁边
- reactjs - 如何在 React 组件中使用 Meteor.settings
- php - 在具有相同类的多个元素上使用 AJAX/jquery 向 MySQL 添加记录
- javascript - 从 Jest 测试快照中删除额外的换行符?
- oracle - 通过 SQL Server PolyBase 在 Oracle 上创建外部表时出错
- java - 如何重构从属性文件中读取的所谓常量值
- apache-kafka - Confluent Schema Registry 作为独立服务