首页 > 解决方案 > SQL 模式不适用于特定 5-95 分位数内的基本统计信息提取

问题描述

我想在CMPC表中提取变量B2, B3, B4, B8, NDVI, SAVI, SIPI, SR, RGI, TVI, MSR, PRI, GNDVI,聚合和变量的5-95 个分位数内的均值、最大值、最小值和 sd 提取:PSRIGCIAGEESPAC

我的 CMPC SQL 表 ( [PROJECT_ID].spectra_calibration.CMPC) 在 BigQuery 中创建:

x   FLOAT   NULLABLE    
y   FLOAT   NULLABLE    
stand   STRING  NULLABLE    
date    STRING  NULLABLE    
B2  FLOAT   NULLABLE    
B3  FLOAT   NULLABLE    
B4  FLOAT   NULLABLE    
B8  FLOAT   NULLABLE    
NDVI    FLOAT   NULLABLE    
SAVI    FLOAT   NULLABLE    
SIPI    FLOAT   NULLABLE    
SR  FLOAT   NULLABLE    
RGI FLOAT   NULLABLE    
TVI INTEGER NULLABLE    
MSR FLOAT   NULLABLE    
PRI FLOAT   NULLABLE    
GNDVI   FLOAT   NULLABLE    
PSRI    FLOAT   NULLABLE    
GCI FLOAT   NULLABLE    
ID_PROJETO  INTEGER NULLABLE    
PROJETO STRING  NULLABLE    
CD_TALHAO   STRING  NULLABLE    
DATA_PLANT  STRING  NULLABLE    
ESPECIE STRING  NULLABLE    
ESPAC   STRING  NULLABLE    
AGE_1   FLOAT   NULLABLE    
AGE INTEGER NULLABLE    

参考 CSV 中的示例表

我尝试只查询一个变量B2

SELECT DISTINCT AGE, ESPAC
,PERCENTILE_DISC(B2,0.05) OVER(PARTITION BY AGE, ESPAC) AS P05_B2
,PERCENTILE_DISC(B2,0.95) OVER(PARTITION BY AGE, ESPAC) AS P95_B2
,MIN(B2) OVER (PARTITION BY AGE, ESPAC ORDER BY B2) AS MIN_B2
,AVG(B2) OVER (PARTITION BY AGE, ESPAC ORDER BY B2) AS AVG_B2
,MAX(B2) OVER (PARTITION BY AGE, ESPAC ORDER BY B2) AS MAX_B2
,stddev(B2) OVER (PARTITION BY AGE, ESPAC ORDER BY B2) AS StDev_B2
FROM `[PROJECT_ID].spectra_calibration.CMPC`
ORDER BY AGE, ESPAC

我的输出不是仅在 5 到 95 分位数区间内的 max、min、avg 和 sd,而是查询结果为所有表值:

Linha   AGE ESPAC   P05_B2  P95_B2  B2  
1   -2  4X1.85  100.0   149.0   134.0   
2   -2  4X1.85  100.0   149.0   106.0   
3   -2  4X1.85  100.0   149.0   129.0   
4   -2  4X1.85  100.0   149.0   121.5   
5   -2  4X1.85  100.0   149.0   152.0   
6   -2  4X1.85  100.0   149.0   157.0   
...
3518    13  3.5x2.14    53.0    167.0   62.0

基本思想是计算的最终结果(MAX、MIN、SD 和 AVG),其B2值仅具有值 > P05_B2和 < P95_B2。我想要一些B2 > P05_B2 & B2 < P95_B2有条件的东西:

SELECT DISTINCT AGE, ESPAC
,PERCENTILE_DISC(B2,0.05) OVER(PARTITION BY AGE, ESPAC) AS P05_B2
,PERCENTILE_DISC(B2,0.95) OVER(PARTITION BY AGE, ESPAC) AS P95_B2
,MIN(B2 > P05_B2 & B2 < P95_B2) OVER (PARTITION BY AGE, ESPAC ORDER BY B2) AS MIN_B2
,AVG(B2 > P05_B2 & B2 < P95_B2) OVER (PARTITION BY AGE, ESPAC ORDER BY B2) AS AVG_B2
,MAX(B2 > P05_B2 & B2 < P95_B2) OVER (PARTITION BY AGE, ESPAC ORDER BY B2) AS MAX_B2
,stddev(B2 > P05_B2 & B2 < P95_B2) OVER (PARTITION BY AGE, ESPAC ORDER BY B2) AS StDev_B2
FROM `[PROJECT_ID].spectra_calibration.CMPC`
ORDER BY AGE, ESPAC

请帮助构建此查询。

标签: sqlgoogle-bigquery

解决方案


WITHIN修饰符在聚合函数中使用并且PERCENTILE_DISC 是导航函数,因此是错误的。另外,请注意这WITHIN是旧的 SQL 语法。

找到一篇关于将PERCENTILE_DISC 函数从 Teradata 转换为 Bigquery的文章。

Teradata 中的 PERCENTILE_DISC 函数:

PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY X)

Bigquery 中等效的 PERCENTILE_DISC 函数:

PERCENTILE_DISC(X,0.5) OVER()

WITHIN GROUP (ORDER BY B2)在您的脚本中删除,它在我的一端运行而没有错误。


推荐阅读