sql - 按列分组小于或等于该列中的值
问题描述
我希望能够为每个日期找到按组内最大分数分组的唯一 uid 计数。举个例子:
输入表:
date uid score
2016-11-01 100 1
2016-11-01 100 1
2016-11-01 200 1
2016-11-01 300 1
2016-11-01 100 2
2016-11-01 400 2
2016-11-02 100 1
2016-11-02 400 2
2016-11-02 500 3
2016-11-02 600 3
2016-11-02 400 4
预期查询结果:
date unique_uid_count score_leq_than
2016-11-01 3 1
2016-11-01 4 2
2016-11-02 1 1
2016-11-02 2 2
2016-11-02 4 3
2016-11-02 4 4
一种方法是炸毁表格,列出所有分数低于实际分数的 uid,然后按如下方式执行 COUNT DISTINCT:
SELECT COUNT(DISTINCT uid), date, score
FROM (SELECT t1.uid, t1.date, t.score FROM (SELECT DISTINCT date, score FROM tbl) t
INNER JOIN tbl t1 ON t1.date = t.date AND t1.score <= t.score)
GROUP BY date, score
这似乎相当低效。有没有更好的办法?
解决方案
嗯。. . 我认为您可以通过计算每个/的最低分数然后使用累积和来解决这个问题:uid
date
select date, min_score as score, count(*) as exact_score,
sum(count(*)) over (partition by date order by min_score)
from (select date, uid, min(score) as min_score
from tbl
group by date, uid
) tbl
group by date, min_score;
实际上,这会过滤掉不是最低分数的分数。为了保留它们,让我们使用类似的想法,但使用row_number()
:
select date, score as score, count(*) as exact_score,
sum(sum(case when seqnum = 1 then 1 else 0 end)) over (partition by date order by score)
from (select tbl.*,
row_number() over (partition by date, uid order by score) as seqnum
from tbl
) tbl
group by date, min_score;
基本上,row_number()
确保每个用户每天只计算一次。. . 但是累积总和随后也会将所有大于最低分数的分数计算在内。
推荐阅读
- c++ - Boost Log V2 中的缩进消息
- java - 捕获的异常无法使用 Gradle 分发“https://services.gradle.org/distributions/gradle-3.0-bin.zip”执行构建
- android - ExpandableListView 与常规菜单项一起,在导航抽屉内,未正确显示
- python - 如何更新 Azure Device Twin 所需和报告的属性
- javascript - React:如何有效地制作我的组件,以便在道具更改时整个组件不会重新渲染?
- python - 如何从 Django 中的 bootrap 模态表单中读取输入?
- snowflake-cloud-data-platform - 使用 Snowflake 进行参数化查询并从 Snowflake .NET 连接器传递值
- .net - 命名空间“System.Windows”中不存在类型或命名空间名称“Forms”(您是否缺少程序集引用?)
- javafx - 如何从 JavaFX Stage 获取本机窗口句柄?
- cuda - 在将设备内存释放回分配器之前是否需要同步?