首页 > 解决方案 > 按列分组小于或等于该列中的值

问题描述

我希望能够为每个日期找到按组内最大分数分组的唯一 uid 计数。举个例子:

输入表:

date                uid      score
2016-11-01          100      1
2016-11-01          100      1  
2016-11-01          200      1
2016-11-01          300      1   
2016-11-01          100      2
2016-11-01          400      2           
2016-11-02          100      1
2016-11-02          400      2
2016-11-02          500      3
2016-11-02          600      3 
2016-11-02          400      4

预期查询结果:

date            unique_uid_count      score_leq_than         
2016-11-01              3                    1 
2016-11-01              4                    2
2016-11-02              1                    1 
2016-11-02              2                    2
2016-11-02              4                    3 
2016-11-02              4                    4

一种方法是炸毁表格,列出所有分数低于实际分数的 uid,然后按如下方式执行 COUNT DISTINCT:

SELECT COUNT(DISTINCT uid), date, score
FROM (SELECT t1.uid, t1.date, t.score FROM (SELECT DISTINCT date, score FROM tbl) t
      INNER JOIN tbl t1 ON t1.date = t.date AND t1.score <= t.score)
GROUP BY date, score

这似乎相当低效。有没有更好的办法?

标签: sqlsnowflake-cloud-data-platform

解决方案


嗯。. . 我认为您可以通过计算每个/的最低分数然后使用累积和来解决这个问题:uiddate

select date, min_score as score, count(*) as exact_score,
       sum(count(*)) over (partition by date order by min_score)
from (select date, uid, min(score) as min_score
      from tbl
      group by date, uid
     ) tbl
group by date, min_score;

实际上,这会过滤掉不是最低分数的分数。为了保留它们,让我们使用类似的想法,但使用row_number()

select date, score as score, count(*) as exact_score,
       sum(sum(case when seqnum = 1 then 1 else 0 end)) over (partition by date order by score)
from (select tbl.*,
             row_number() over (partition by date, uid order by score) as seqnum
      from tbl
     ) tbl
group by date, min_score;

基本上,row_number()确保每个用户每天只计算一次。. . 但是累积总和随后也会将所有大于最低分数的分数计算在内。


推荐阅读