sql-server - TSQL: group by Substring (Name) and retrieve ID in SELECT
问题描述
We have companies' data stored in a table. In an effort to de-duplicate the rows, we need to identify duplicate data sets of companies by using following criterion: If First five letters of the CompanyName, City and postal code match with other records' same fields then it is a duplicate. We will later remove the duplicates. The problem I am running in to is that I can't retrieve IDs of these records since I am not grouping the records on ID. I am using following SQL:
Select count(ID) as DupCount
, SUBSTRING(Name,1,5) as Name
, City
, PostalCode
from tblCompany
group by SUBSTRING(Name,1,5)
, City
, PostalCode
Having count(ID) > 1
order by count(ID) desc
How do I retrieve the ID of these records?
解决方案
用于group_concat()
以逗号分隔列表的形式获取 id:
select
SUBSTRING(Name,1,5) as Name,
City,
PostalCode,
count(ID) as counter,
group_concat(id order by id) as ids
from tblCompany
group by SUBSTRING(Name,1,5), City, PostalCode
having count(ID) > 1
order by count(ID) desc
推荐阅读
- javascript - addEventListener 在下拉列表中看不到变化
- node.js - 在我的线性回归模型在 Tensorflow.js 中完成训练后,如何检索系数的值?
- php - Symfony 5.3 - 为什么我的记住我功能不起作用?
- powershell - 使用powershell切换服务运行状态(使用UAC提示)
- django - 跳过 Graphql 查询的 DRF 身份验证
- javascript - 通过js从json获取随机消息
- javascript - 强制运行跳过的测试
- flutter - 网站通常使用弹出窗口来实现 OAuth。我们如何在 flutter_inappwebview 中处理这个问题?
- java - url jdbc 中自动 sript DML 和 DDL 命令的 H2 DB 路径
- python - 对于 NumPy 2D 数组中的每个非零元素,计算到最近非零元素的欧几里得距离的高效和 Pythonic 方法