首页 > 解决方案 > 需要 SQL 事务协助以在数据库中压缩数据

问题描述

我试图在一个数据库表中压缩数据,该表包含多个具有各种列数据的唯一记录实例。

我想为每个特定的唯一记录选择每列的最高出现值

但我的 SQL 事务不工作。

[dataBase1].[dbo].[table1]有几十万条记录,有几列(Name, Place, etc.)

[dataBase1].[dbo].[table2]具有来自的唯一名称列表[table1]和其余为空的列(位置等)的标题。

我尝试了以下代码。

DECLARE @name varchar(max);
DECLARE @place varchar(max);

DECLARE db_cursor SCROLL CURSOR FOR 
     SELECT [Name] 
     FROM [dataBase1].[dbo].[table2];

OPEN HostName_cursor

FETCH NEXT FROM db_cursor INTO @name;

WHILE @@FETCH_STATUS = 0
BEGIN
     SELECT DISTINCT TOP(1) @place = [Place] 
     FROM [dataBase1].[dbo].[table1] 
     WHERE [Name] = @name 
       AND [Place] IS NOT NULL AND [Place] <> '' 
       AND (EXISTS  (SELECT [Place], COUNT (*) AS TOTAL 
                     FROM [dataBase1].[dbo].[table1] 
                     GROUP BY [Place])) 
     GROUP BY [Place];

     UPDATE [dataBase1].[dbo].[table2] 
     SET [Place] = @place 
     WHERE [Name] = @name;      

     SET @place = '';

     FETCH NEXT FROM db_cursor INTO @name
END

[Place]特定唯一的列[Name]53个值,最高重复值计数是3。本质上,我想为每个唯一的[Name].

SELECT DISTINCT TOP 1 
    [Place], COUNT (*) TOTAL 
FROM 
    [dataBase1].[dbo].[table1] 
WHERE 
    [Name] = 'xxxxxx' 
    AND [Place] IS NOT NULL AND [Place] <> '' 
GROUP BY [Place] 
ORDER BY TOTAL DESC;

标签: sqlsql-serversqltransaction

解决方案


这可以通过多个步骤来完成,每个步骤都建立在下一个步骤上。您想同时处理所有名称和地点。

首先,您想计算每个名称、地点组合的数量,因此按名称和地点分组,然后计算地点。您的查询将如下所示

SELECT name, place, COUNT(place) as placecount
FROM table1
GROUP BY name, place

现在,您需要找到计数最多的一个,如果出现平局,则按字母顺序排列第一个。您可以通过对上述结果执行 ROW_NUMBER 来执行此操作,重新开始对名称进行计数(分区),并按位置计数排序,然后按位置来解决关系。使用 CTE(您也可以将其作为子查询执行),这看起来像

WITH places as (
  SELECT name, place, COUNT(place) as placecount
  FROM table1
  GROUP BY name, place
)
SELECT name, place, ROW_NUMBER() OVER (PARTITION BY name ORDER BY placecount, place) as RN
FROM places

如果您查看该数据,您想要的任何给定名称的位置都应该在 RN 为 1 的行上。因此,您可以通过类似这样的查询获得所需的最终数据

WITH places as (
  SELECT name, place, COUNT(place) as placecount
  FROM table1
  GROUP BY name, place
), orderplaces as (
  SELECT name, place, ROW_NUMBER() OVER (PARTITION BY name ORDER BY placecount, place) as RN
  FROM places
)
Select name, place
FROM orderplaces
WHERE RN = 1

由于您想使用此地点数据更新 table2 而不是查看它,因此您将在最终查询中加入 table2 并更新,如下所示

WITH places as (
  SELECT name, place, COUNT(place) as placecount
  FROM table1
  GROUP BY name, place
), orderplaces as (
  SELECT name, place, ROW_NUMBER() OVER (PARTITION BY name ORDER BY placecount, place) as RN
  FROM places
)
UPDATE T2 set place = OP.place
FROM orderplaces OP
   INNER JOIN table2 T2 on T2.name = OP.name
WHERE RN = 1;

推荐阅读