首页 > 解决方案 > 以数组为参数的雪花函数因不支持的子查询错误而失败

问题描述

我有一个事务表,我需要对相似的记录进行分组,对于可能是唯一的类列,我需要从查找表(类表)中选择相似记录的值中的最高值(循环表根据优先级排序)。

从班级中选择*;

ID  NAME
2   BETA
6   OMEGA
5   SIGMA
1   ALPHA1
3   GAMMA
4   DELTA
CREATE OR REPLACE FUNCTION "MIN_VALUE"(classlist array)
RETURNS VARCHAR(200)
LANGUAGE SQL
AS '
    select NAME from CLASS 
    where ID in ( select min(ID) from CLASS 
                               where NAME in (select value from table(flatten(input=>classlist))))
';

从 T_DATA 中选择 *;

C_ID    P_ID    D_ID    S_ID    CLASS
1101111 1404    564     1404    BETA
1101111 1404    599     1425    ALPHA
1101111 1404    564     1404    OMEGA
1101111 1404    564     1425    ALPHA
1101111 1404    564     1404    GAMMA
1101111 1404    564     1425    GAMMA
1101111 1404    599     1425    GAMMA
1101111 1404    564     1425    OMEGA

当我编写如下查询时,它可以正常工作

选择 MIN_VALUE(array_construct('OMEGA','GAMMA','BETA'));

当我在实际查询中使用它时,它会因SQL 编译错误而失败:无法评估不支持的子查询类型

select C_ID, P_ID, D_ID, S_ID, MIN_VALUE(class_array) from (
    select C_ID, P_ID, D_ID, S_ID, arrayagg(class) class_array
    from t_data 
    group by C_ID,P_ID,D_ID,S_ID
);

OR

select C_ID,P_ID,D_ID,S_ID,MIN_VALUE(ca) from (
    select C_ID,P_ID,D_ID,S_ID,array_construct(class_array) ca from (
        select C_ID,P_ID,D_ID,S_ID,arrayagg(class) class_array
        from t_data 
        group by C_ID,P_ID,D_ID,S_ID
    )
);

我期待上面 8 条记录的输出如下

select C_ID,P_ID,D_ID,S_ID,array_construct(class_array) ca from (
        select C_ID,P_ID,D_ID,S_ID,arrayagg(class) class_array
        from t_data 
        group by C_ID,P_ID,D_ID,S_ID
    );

Output

C_ID    P_ID    D_ID    S_ID    CLASS_ARRAY
1101111 1404    564     1404    ["OMEGA", "GAMMA", "BETA"]
1101111 1404    599     1425    ["ALPHA", "GAMMA"]
1101111 1404    564     1425    ["ALPHA", "GAMMA", "OMEGA"]

When I use the min_value function on the above class_array that will return a single value based on the priority in the lookup table.

C_ID    P_ID    D_ID    S_ID    CLASS_ARRAY
1101111 1404    564     1404    BETA
1101111 1404    599     1425    ALPHA
1101111 1404    564     1425    ALPHA

请提出一些选项来弄清楚为什么函数对于硬编码值运行良好但如果在查询中构造数组并作为参数传递则失败。

标签: sqlsnowflake-cloud-data-platform

解决方案


Snowflake 对在列定义中包含一些 SELECT 模式的 SQL 语句的支持存在一些限制。有几种方法可以重写上面的查询以获得所需的结果:

1)找到最小ID,然后加入类表:

with T as (  
  select C_ID, P_ID, D_ID, S_ID, min(class.id) minclassid
  from t_data join class
     on class.name = t_data.class
  group by C_ID,P_ID,D_ID,S_ID
)
select C_ID, P_ID, D_ID, S_ID, class.name
from T join CLASS on minclassid = class.id;

2) 或者使用 Windowing 函数来获取组内按 ID 排序的第一个类名:

select distinct C_ID, P_ID, D_ID, S_ID, 
   first_value(class.name) over 
     (partition by C_ID, P_ID, D_ID, S_ID order by class.id) name
from t_data join class
on class.name = t_data.class;

推荐阅读