sql - SQL 模式不适用于特定 5-95 分位数内的基本统计信息提取
问题描述
我想在CMPC表中提取变量B2
,
B3
, B4
, B8
, NDVI
, SAVI
, SIPI
, SR
, RGI
, TVI
, MSR
, PRI
, GNDVI
,聚合和变量的5-95 个分位数内的均值、最大值、最小值和 sd 提取:PSRI
GCI
AGE
ESPAC
我的 CMPC SQL 表 ( [PROJECT_ID].spectra_calibration.CMPC
) 在 BigQuery 中创建:
x FLOAT NULLABLE
y FLOAT NULLABLE
stand STRING NULLABLE
date STRING NULLABLE
B2 FLOAT NULLABLE
B3 FLOAT NULLABLE
B4 FLOAT NULLABLE
B8 FLOAT NULLABLE
NDVI FLOAT NULLABLE
SAVI FLOAT NULLABLE
SIPI FLOAT NULLABLE
SR FLOAT NULLABLE
RGI FLOAT NULLABLE
TVI INTEGER NULLABLE
MSR FLOAT NULLABLE
PRI FLOAT NULLABLE
GNDVI FLOAT NULLABLE
PSRI FLOAT NULLABLE
GCI FLOAT NULLABLE
ID_PROJETO INTEGER NULLABLE
PROJETO STRING NULLABLE
CD_TALHAO STRING NULLABLE
DATA_PLANT STRING NULLABLE
ESPECIE STRING NULLABLE
ESPAC STRING NULLABLE
AGE_1 FLOAT NULLABLE
AGE INTEGER NULLABLE
我尝试只查询一个变量B2
(
SELECT DISTINCT AGE, ESPAC
,PERCENTILE_DISC(B2,0.05) OVER(PARTITION BY AGE, ESPAC) AS P05_B2
,PERCENTILE_DISC(B2,0.95) OVER(PARTITION BY AGE, ESPAC) AS P95_B2
,MIN(B2) OVER (PARTITION BY AGE, ESPAC ORDER BY B2) AS MIN_B2
,AVG(B2) OVER (PARTITION BY AGE, ESPAC ORDER BY B2) AS AVG_B2
,MAX(B2) OVER (PARTITION BY AGE, ESPAC ORDER BY B2) AS MAX_B2
,stddev(B2) OVER (PARTITION BY AGE, ESPAC ORDER BY B2) AS StDev_B2
FROM `[PROJECT_ID].spectra_calibration.CMPC`
ORDER BY AGE, ESPAC
我的输出不是仅在 5 到 95 分位数区间内的 max、min、avg 和 sd,而是查询结果为所有表值:
Linha AGE ESPAC P05_B2 P95_B2 B2
1 -2 4X1.85 100.0 149.0 134.0
2 -2 4X1.85 100.0 149.0 106.0
3 -2 4X1.85 100.0 149.0 129.0
4 -2 4X1.85 100.0 149.0 121.5
5 -2 4X1.85 100.0 149.0 152.0
6 -2 4X1.85 100.0 149.0 157.0
...
3518 13 3.5x2.14 53.0 167.0 62.0
基本思想是计算的最终结果(MAX、MIN、SD 和 AVG),其B2
值仅具有值 > P05_B2
和 < P95_B2
。我想要一些B2 > P05_B2 & B2 < P95_B2
有条件的东西:
SELECT DISTINCT AGE, ESPAC
,PERCENTILE_DISC(B2,0.05) OVER(PARTITION BY AGE, ESPAC) AS P05_B2
,PERCENTILE_DISC(B2,0.95) OVER(PARTITION BY AGE, ESPAC) AS P95_B2
,MIN(B2 > P05_B2 & B2 < P95_B2) OVER (PARTITION BY AGE, ESPAC ORDER BY B2) AS MIN_B2
,AVG(B2 > P05_B2 & B2 < P95_B2) OVER (PARTITION BY AGE, ESPAC ORDER BY B2) AS AVG_B2
,MAX(B2 > P05_B2 & B2 < P95_B2) OVER (PARTITION BY AGE, ESPAC ORDER BY B2) AS MAX_B2
,stddev(B2 > P05_B2 & B2 < P95_B2) OVER (PARTITION BY AGE, ESPAC ORDER BY B2) AS StDev_B2
FROM `[PROJECT_ID].spectra_calibration.CMPC`
ORDER BY AGE, ESPAC
请帮助构建此查询。
解决方案
WITHIN
修饰符在聚合函数中使用并且PERCENTILE_DISC
是导航函数,因此是错误的。另外,请注意这WITHIN
是旧的 SQL 语法。
找到一篇关于将PERCENTILE_DISC 函数从 Teradata 转换为 Bigquery的文章。
Teradata 中的 PERCENTILE_DISC 函数:
PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY X)
Bigquery 中等效的 PERCENTILE_DISC 函数:
PERCENTILE_DISC(X,0.5) OVER()
WITHIN GROUP (ORDER BY B2)
在您的脚本中删除,它在我的一端运行而没有错误。
推荐阅读
- java - Java调用枚举的静态方法,作为类类型变量传递而不进行强制转换
- html - 如何将鼠标悬停在链接的图像上?(它们是 Wordpress 帖子中的图标)
- ruby-on-rails - ArgumentError:SMTP 收件人地址不能为空:[]
- labview - 如何使用LabVIEW读取文本文件?
- html - 对齐内容使网格项目占用更少的空间
- html - 边距/填充位于 div 之外
- python - 即使 while 循环被评估为假,语句仍在运行?
- scala - 如何在方法中将对象作为参数传递?
- javascript - 我如何存储 Object.keys?
- igraph - igraph :为社区之间的边缘添加形状