首页 > 解决方案 > 如何确保在雪花中使用 mode() 的查询的确定性结果



FROM table


如果最频繁的值存在平局(两个或多个值彼此出现的频率相同,并且比任何其他值更频繁),则 MODE 返回其中一个值。

https://docs.snowflake.net/manuals/sql-reference/functions/mode.html 我需要一些解决方法来获得等效的mode(),这总是会产生确定性的输出。类似的东西:使用mode(),但在某些列的平局的情况下,选择第一个值。


标签: sqlsnowflake-cloud-data-platformmode


所以 mode 似乎更喜欢它在决胜局中看到的第一个值。

with data as (
select x, col1, col2, col3 from values (1, 1, 1, 3), (1, 1, 2,3), (1, 2, 2,3)
    ,(4, 1, 20, 30), (4, 1, 2, 3), (4, 2, 2, 30), (4,2,20,3) v(x,col1,col2,col3)
select x
from data 
group by 1
order by 1;

交换 2/20 或 3/30 对的第一个值显示了这一点。


with data as (
select x, col1, col2, col3 from values (1, 1, 1, 3), (1, 1, 2,3), (1, 2, 2,3)
    ,(4, 1, 20, 30), (4, 1, 2, 3), (4, 2, 2, 30), (4,2,20,3) v(x,col1,col2,col3)
select x
    ,count(col1)over(partition by x,col1) c_col1
    ,count(col2)over(partition by x,col2) c_col2
    ,count(col3)over(partition by x,col3) c_col3
from data ;


with data as (
select x, col1, col2, col3 from values (1, 1, 1, 3), (1, 1, 2,3), (1, 2, 2,3)
    ,(4, 1, 20, 30), (4, 1, 2, 3), (4, 2, 2, 30), (4,2,20,3) v(x,col1,col2,col3)
select x
    ,row_number() over (partition by x order by c_col1 desc, col1) as r1
    ,row_number() over (partition by x order by c_col2 desc, col2) as r2
    ,row_number() over (partition by x order by c_col3 desc, col3) as r3
from (
  select x
      ,count(col1)over(partition by x,col1) c_col1
      ,count(col2)over(partition by x,col2) c_col2
      ,count(col3)over(partition by x,col3) c_col3
  from data 
order by 1;


X   COL1    COL2    COL3    R1  R2  R3
1   1   2   3   2   1   1
1   2   2   3   3   2   2
1   1   1   3   1   3   3
4   1   2   3   2   1   1
4   2   20  3   4   4   2
4   2   2   30  3   2   3
4   1   20  30  1   3   4


QUALIFY row_number() over (partition by x order by c_col1 desc, col1) = 1
  AND row_number() over (partition by x order by c_col2 desc, col2) = 1
  AND row_number() over (partition by x order by c_col3 desc, col3 desc) = 1


这导致每列都有一个 CTE(或子查询),与 Gorndon 展示的模式非常相似。

with data as (
select x, col1, col2, col3 from values (1, 1, 1, 3), (1, 1, 2,3), (1, 2, 2,3)
    ,(4, 1, 20, 30), (4, 1, 2, 3), (4, 2, 2, 30), (4,2,20,3) v(x,col1,col2,col3)
),col1_m as (
    select x, col1, count(*) as c 
    from data 
    group by 1,2
    QUALIFY row_number() over (partition by x order by c desc, col1) = 1
),col2_m as (
    select x, col2, count(*) as c 
    from data 
    group by 1,2
    QUALIFY row_number() over (partition by x order by c desc, col2) = 1
),col3_m as (
    select x, col3, count(*) as c 
    from data 
    group by 1,2
    QUALIFY row_number() over (partition by x order by c desc, col3) = 1
), base as (
select distinct x from data
select b.x
from base as b
left join col1_m as c1 on b.x = c1.x
left join col2_m as c2 on b.x = c2.x
left join col3_m as c3 on b.x = c3.x
order by 1;


X   COL1    COL2    COL3
1   1   2   3
4   1   2   3

但是您需要将 X 扩展为您关心的一组事物 (x,y,..) 等。
