sql - 需要 SQL 事务协助以在数据库中压缩数据
问题描述
我试图在一个数据库表中压缩数据,该表包含多个具有各种列数据的唯一记录实例。
我想为每个特定的唯一记录选择每列的最高出现值
但我的 SQL 事务不工作。
[dataBase1].[dbo].[table1]
有几十万条记录,有几列(Name, Place, etc.)
。
[dataBase1].[dbo].[table2]
具有来自的唯一名称列表[table1]
和其余为空的列(位置等)的标题。
我尝试了以下代码。
DECLARE @name varchar(max);
DECLARE @place varchar(max);
DECLARE db_cursor SCROLL CURSOR FOR
SELECT [Name]
FROM [dataBase1].[dbo].[table2];
OPEN HostName_cursor
FETCH NEXT FROM db_cursor INTO @name;
WHILE @@FETCH_STATUS = 0
BEGIN
SELECT DISTINCT TOP(1) @place = [Place]
FROM [dataBase1].[dbo].[table1]
WHERE [Name] = @name
AND [Place] IS NOT NULL AND [Place] <> ''
AND (EXISTS (SELECT [Place], COUNT (*) AS TOTAL
FROM [dataBase1].[dbo].[table1]
GROUP BY [Place]))
GROUP BY [Place];
UPDATE [dataBase1].[dbo].[table2]
SET [Place] = @place
WHERE [Name] = @name;
SET @place = '';
FETCH NEXT FROM db_cursor INTO @name
END
[Place]
特定唯一的列[Name]
有53个值,最高重复值计数是3。本质上,我想为每个唯一的[Name]
.
SELECT DISTINCT TOP 1
[Place], COUNT (*) TOTAL
FROM
[dataBase1].[dbo].[table1]
WHERE
[Name] = 'xxxxxx'
AND [Place] IS NOT NULL AND [Place] <> ''
GROUP BY [Place]
ORDER BY TOTAL DESC;
解决方案
这可以通过多个步骤来完成,每个步骤都建立在下一个步骤上。您想同时处理所有名称和地点。
首先,您想计算每个名称、地点组合的数量,因此按名称和地点分组,然后计算地点。您的查询将如下所示
SELECT name, place, COUNT(place) as placecount
FROM table1
GROUP BY name, place
现在,您需要找到计数最多的一个,如果出现平局,则按字母顺序排列第一个。您可以通过对上述结果执行 ROW_NUMBER 来执行此操作,重新开始对名称进行计数(分区),并按位置计数排序,然后按位置来解决关系。使用 CTE(您也可以将其作为子查询执行),这看起来像
WITH places as (
SELECT name, place, COUNT(place) as placecount
FROM table1
GROUP BY name, place
)
SELECT name, place, ROW_NUMBER() OVER (PARTITION BY name ORDER BY placecount, place) as RN
FROM places
如果您查看该数据,您想要的任何给定名称的位置都应该在 RN 为 1 的行上。因此,您可以通过类似这样的查询获得所需的最终数据
WITH places as (
SELECT name, place, COUNT(place) as placecount
FROM table1
GROUP BY name, place
), orderplaces as (
SELECT name, place, ROW_NUMBER() OVER (PARTITION BY name ORDER BY placecount, place) as RN
FROM places
)
Select name, place
FROM orderplaces
WHERE RN = 1
由于您想使用此地点数据更新 table2 而不是查看它,因此您将在最终查询中加入 table2 并更新,如下所示
WITH places as (
SELECT name, place, COUNT(place) as placecount
FROM table1
GROUP BY name, place
), orderplaces as (
SELECT name, place, ROW_NUMBER() OVER (PARTITION BY name ORDER BY placecount, place) as RN
FROM places
)
UPDATE T2 set place = OP.place
FROM orderplaces OP
INNER JOIN table2 T2 on T2.name = OP.name
WHERE RN = 1;
推荐阅读
- tcl - tcl 中的多维数组搜索
- c - C中的解码声明(数组和函数指针的组合)
- typescript - 使用量角器打字稿框架中的打字稿遍历 HTML 表格以获取单元格
- javascript - $request->ajax() 在 ajax 调用时返回 false
- android - 无法通过 Android Oreo 中的 JobScheduler 启动服务
- python - 在 Python (Pandas) 中创建循环,遍历行名并创建分组多索引
- firebase - Firebase 函数从不同路径接收数据和求和
- sockets - Cloud SQL 套接字如何知道要连接到哪个域?
- javascript - 克隆所选元素
- javascript - 使用 JavaScript 交替向列表项添加类