sql - 针对包含 7500 万条记录的表优化存储过程查询
问题描述
我有一个AFW_Coverage
包含 7500 万行的表。还有另一个AFW_BasicPolInfo
包含大约 300 万行的表。
我编写了以下存储过程来从表中获取记录:
CREATE PROCEDURE [ams360].[GetPolicyCoverages]
@PageStart INT = 0,
@PageSize INT = 50000,
@RowVersion TIMESTAMP = NULL
AS
SET NOCOUNT ON;
;WITH LatestCoverage AS
(
SELECT
PolId,
MAX(EffDate) AS CoverageEffectiveDate
FROM
ams360.AFW_Coverage
GROUP BY
PolId
),
Coverages AS
(
SELECT
cov.PolId,
cov.LobId,
cov.CoverageId,
cov.EffDate,
cov.CoverageCode,
cov.isCoverage,
cov.FullTermPrem,
cov.Limit1,
cov.Limit2,
cov.Limit3,
cov.Deduct1,
cov.Deduct2,
cov.Deduct3,
cov.ChangedDate,
cov.RowVersion,
FROM
ams360.AFW_Coverage cov
INNER JOIN
LatestCoverage mcov ON cov.PolId = mcov.PolId
AND cov.EffDate = mcov.CoverageEffectiveDate
WHERE
cov.Status IN ('A', 'C')
)
SELECT
BPI.PolId,
BPI.PolEffDate,
BPI.PolExpDate,
BPI.PolTypeLOB,
cov.LobId,
cov.CoverageId,
cov.EffDate,
cov.CoverageCode,
cov.isCoverage,
cov.FullTermPrem,
cov.Limit1,
cov.Limit2,
cov.Limit3,
cov.Deduct1,
cov.Deduct2,
cov.Deduct3,
cov.ChangedDate,
cov.RowVersion,
FROM
ams360.AFW_BasicPolInfo BPI
INNER JOIN
Coverages cov ON bpi.PolId = cov.PolId
WHERE
BPI.Status IN ('A','C')
AND BPI.PolTypeLOB IN ('Homeowners', 'Dwelling Fire')
AND BPI.PolSubType = 'P'
AND BPI.RenewalRptFlag IN ('A', 'R', 'I', 'N')
AND GETDATE() BETWEEN BPI.PolEffDate AND BPI.PolExpDate
AND (@RowVersion IS NULL OR cov.RowVersion > @RowVersion)
GROUP BY
BPI.PolId,
BPI.PolEffDate,
BPI.PolExpDate,
BPI.PolTypeLOB,
cov.LobId,
cov.CoverageId,
cov.EffDate,
cov.CoverageCode,
cov.isCoverage,
cov.FullTermPrem,
cov.Limit1, cov.Limit2, cov.Limit3,
cov.Deduct1, cov.Deduct2, cov.Deduct3,
cov.ChangedDate,
cov.RowVersion,
ORDER BY
cov.RowVersion
OFFSET
@PageStart ROWS
FETCH NEXT
@PageSize ROWS ONLY
GO
但是,我发现上述存储过程以 100% 的速度与数据库挂钩,尽管我添加了以下索引,我看到它们在执行计划中使用:
CREATE NONCLUSTERED INDEX [IX_AFW_Coverage_PolId_EffDate]
ON [ams360].[AFW_Coverage] ([PolId] ASC, [EffDate] ASC)
WITH (STATISTICS_NORECOMPUTE = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_AFW_Coverage_PolId_EffDate_Status_LobId_CoverageId]
ON [ams360].[AFW_Coverage] ([PolId] ASC, [EffDate] ASC, [Status] ASC, [LobId] ASC, [CoverageId] ASC)
INCLUDE ([CoverageCode], [IsCoverage], [FullTermPrem], [Limit1], [Limit2],[Limit3], [Deduct1], [Deduct2], [Deduct3], [ChangedDate], [RowVersion])
WITH (STATISTICS_NORECOMPUTE = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]
GO
存储过程的执行时间在 6 分钟到 20 分钟或 50 分钟之间变化(取决于服务器流量和使用情况)
我的问题:如何在存储过程中优化此查询,同时记住覆盖表包含 7500 万条记录这一事实?我不是 dba,也没有优化运行缓慢的查询的经验。有关如何解决此问题的任何见解都会有所帮助。提前致谢。
解决方案
首先,链接公用表表达式可能会导致复杂的执行计划。我们希望计划简单易行,便于引擎优化。
所以,让我们从删除第一个开始:
DROP TABLE IF EXISTS #LatestCoverage;
CREATE TABLE #LatestCoverage
(
PolId BIGINT PRIMARY KEY
,CoverageEffectiveDate DATETIME2(0)
);
INSERT INTO #LatestCoverage
SELECT
PolId,
MAX(EffDate) AS CoverageEffectiveDate
FROM
ams360.AFW_Coverage
GROUP BY
PolId;
如果表中有很多列,ams360.AFW_Coverage
则查询列上的索引可能会提高性能:
CREATE INDEX IX_AFW_Coverage_EffDate ON ams360.AFW_Coverage
(
polID
,EffDate
)
然后,您正在阅读大量最近被剪切的数据。您可以尝试高级过滤数据,然后读取行详细信息。像这样的东西:
DROP TABLE if exists #CoveragesFiltered
CREATE TABLE #CoveragesFiltered
(
PolId BIGINT PRIMARY KEY
,RowVersion ??
);
INSERT INTO #CoveragesFiltered
SELECT
cov.PolId,
cov.RowVersion,
FROM ams360.AFW_Coverage cov
INNER JOIN #LatestCoverage mcov
ON cov.PolId = mcov.PolId
AND cov.EffDate = mcov.CoverageEffectiveDate
WHERE
cov.Status IN ('A', 'C')
AND BPI.Status IN ('A','C')
AND BPI.PolTypeLOB IN ('Homeowners', 'Dwelling Fire')
AND BPI.PolSubType = 'P'
AND BPI.RenewalRptFlag IN ('A', 'R', 'I', 'N')
AND GETDATE() BETWEEN BPI.PolEffDate AND BPI.PolExpDate
AND (@RowVersion IS NULL OR cov.RowVersion > @RowVersion)
ORDER BY
cov.RowVersion
OFFSET
@PageStart ROWS
FETCH NEXT
@PageSize ROWS ONLY;
在这里,您可以调试和优化过滤器查询本身,只为您需要的列创建索引。
然后,有需要返回的行,提取它们的详细信息 - 因为我们使用分页,我相信它会表现良好并且花费更少的 IO。
推荐阅读
- c# - 为 ICollection 设置默认值
C#中DTO的属性 - r - “预测”函数中的参数“s”是什么意思?
- r - 如何计算 R 中已滚动的股票投资组合的百分比?
- javascript - 使用 Firebase 中的多个数据附加列表 - JavaScript
- c# - 从 OpenAPI 规范生成客户端代码失败
- python - 我无法在 python 中拆分输入和输出我的数据集
- php - 如何从文件中的一行获取键值对
- nestjs - Nestjs - 可以获得正确实例化的正文有效负载吗?
- python - 套接字接收到不完整的数据
- mysql - 抛出 sqlalchemy.exc.OperationalError