sql - 根据两列之一标记重复项
问题描述
假设我的数据集如下所示:
email name
a f
b g
a g
o k
我想要的输出是:
email name group
a f 1
b g 1
a g 1
o k 2
因为前三行是同一个人,因为他们共享电子邮件或姓氏。我正在努力弄清楚如何编写这样的查询来获取组列。
解决方案
这需要递归 CTE。您可以通过在电子邮件(或名称)之间创建边然后遍历图表来分配组:
with edges as (
select t1.email as email1, t2.email as email2
from t join
t t2
on t1.name = t2.name
),
cte as (
select email1, email2, least(email1, email2) as min_email
array_construct(email1, email2) as visited
from edges e
union all
select cte.email1, e.email2, least(cte.min_email, e.email2),
array_append(cte.visited, e.email2)
from cte join
edges e
on cte.email2 = e.email1
where not array_contains(cte.visited, e.email2)
)
select email1, min(min_email),
dense_rank() over (order by min_email) as grp
from cte
group by email1;
对此进行调整将 分配grp
给原始数据:
with edges as (
select t1.email as email1, t2.email as email2
from t join
t t2
on t1.name = t2.name
),
cte as (
select email1, email2, least(email1, email2) as min_email
array_construct(email1, email2) as visited
from edges e
union all
select cte.email1, e.email2, least(cte.min_email, e.email2),
array_append(cte.visited, e.email2)
from cte join
edges e
on cte.email2 = e.email1
where not array_contains(cte.visited, e.email2)
)
select t.*, grp
from t join
(select email1, min(min_email) as min_email,
dense_rank() over (order by min_email) as grp
from cte
group by email1
) e
on t.email = e.email;
推荐阅读
- meta-tags - 如何使用元标记来解决加载外部脚本的内容安全策略问题
- json - 在 Rest api Request 中创建多个主体
- javascript - Vuetify.js:部署在 Gitlab 上时 v-stepper 中的颜色属性不生效。如何将颜色道具移动到 CSS 类?
- c++ - 计算两个集合之间的交集时出现 set_intersection 错误
- python - 正则表达式用于使用正向前瞻或后视进行增强分配操作
- webpack - CSS 未包含在 index.html 中
- python - 根据表dynamodb python中的存在更新或插入项目
- android - 通过将 apk 发送给其他人,我的应用程序图标消失了
- c# - 从表中获取单行并在第二个表上加入并在详细信息视图中显示
- r - R:包“littler”的编译失败