sql - How to count duplicates based on group by as well as see if values are present in the same field values?
问题描述
I have a table that contains ~1mil rows that looks like so
GROUP | ID
A AED12
A 6D04K
A AED12
B VFR54
B VFR54
B 9KMN2
B AED12
C 9KMN2
C 9KMN2
C VFR54
I would like to have an output that counts the number of duplicates that is grouped by 'GROUP' WHILE also seeing which 'IDs' are present across multiple groups (reference 'AED12' that is in both GROUP A and B).
I am using this query for first question of duplicates across each group
select GROUP, ID, count(ID)
from TABLE1
group by GROUP, ID
having count(ID) > 1
;
output of query above
GROUP | ID | COUNT
A AED12 2
B VFR54 2
C 9KMN2 2
I would like to sum the number of duplicates per group (how many dups are in Group A, B, etc.) which would like so (made up #s for example purposes)
GROUP | Sum of Dups IDs
A 2134
B 23321
C 11235
While also identifying which IDs are present in more than one group and count how many are present across the groups. Output would be something like a case when field that says is ID present in more than 1 group, if so 'true', if not - 'false'. Still thinking about how to count/group what the output would look like since there are million plus rows..
Any help would be much appreciated - new to Snowflake.
解决方案
您可以使用窗口函数计算 id 出现在不同组中的数量,然后进行聚合。这是一个计算单例和对的示例:
select group,
count(*) as num_elements,
count_if(num_groups = 1) as num_onesies,
count_if(num_groups = 2) as num_twosies
from (select t.*,
count(*) over (partition by id) as num_groups
from table1 t
) t
group by group;
编辑:
如果您有重复项,最好在进行上述处理之前将其删除:
select group,
count(*) as num_elements,
count_if(num_groups = 1) as num_onesies,
count_if(num_groups = 2) as num_twosies
from (select t.*,
count(*) over (partition by id) as num_groups
from (select distinct group, id
from table1 t
) t
) t
group by group
推荐阅读
- php - 无法在 laravel 8 中显示图像
- javascript - 用于 websockets 内部的 Chrome DevTools 网络工具
- r - 有没有办法在闪亮的输入小部件标签中获得数学符号?
- mod-rewrite - 代理 mod_rewrite Apache 中的请求时,静态文件也不提供服务
- image - Next.js + Vercel:动态图像在本地显示,但不在生产环境中
- amazon-athena - aws Athena 数据提取
- windows - 将来自 Windows 事件的 xml 值作为变量传递
- flutter - 在 FutureBuilder 中返回一个列表
- php - 我在 laravel 8 上安装 maatwebsite/excel 时遇到问题 | php 8?
- jquery-ui - Jquery UI 可选移动触摸滚动