首页 > 解决方案 > TSQL: group by Substring (Name) and retrieve ID in SELECT

问题描述

We have companies' data stored in a table. In an effort to de-duplicate the rows, we need to identify duplicate data sets of companies by using following criterion: If First five letters of the CompanyName, City and postal code match with other records' same fields then it is a duplicate. We will later remove the duplicates. The problem I am running in to is that I can't retrieve IDs of these records since I am not grouping the records on ID. I am using following SQL:

Select count(ID) as DupCount
       , SUBSTRING(Name,1,5) as Name
       , City
       , PostalCode 
from tblCompany 
group by SUBSTRING(Name,1,5)
         , City
         , PostalCode 
Having count(ID) > 1 
order by count(ID) desc 

How do I retrieve the ID of these records?

标签: sql-servertsqlsubstring

解决方案


用于group_concat()以逗号分隔列表的形式获取 id:

select 
  SUBSTRING(Name,1,5) as Name,
  City,
  PostalCode,
  count(ID) as counter, 
  group_concat(id order by id) as ids
from tblCompany 
group by SUBSTRING(Name,1,5), City, PostalCode 
having count(ID) > 1 
order by count(ID) desc

推荐阅读