sql-server - 左连接上的 CTE 性能缓慢
问题描述
我需要提供一份报告,显示表格上的所有用户及其分数。并非所述表上的所有用户都会有分数,所以在我的解决方案中,我首先使用几个 CTE 计算分数,然后在最终的 CTE 中,我拉出一个完整的名册并将默认分数分配给没有实际分数的用户。
虽然 CTE 并不过分复杂,但它们也并不简单。另外,当我为具有实际分数的用户运行 CTE 的计算部分时,它的运行时间不到一秒。当我加入一个最终的 CTE 时,它会抓取完整的名单并在出现空值的地方分配默认分数(没有实际分数),轮子完全脱落并且永远不会完成。
我已经尝试过切换索引并刷新它们无济于事。我注意到,当切换到 INNER 时,agent_effectiveness 的加入会在一秒钟内运行,但我需要它是一个 LEFT 加入,这样即使没有得分,它也会拉入整个名册。
编辑*
WITH agent_split_stats AS (
Select
racf,
agent_stats.SkillGroupSkillTargetID,
aht_target.EnterpriseName,
aht_target.target,
Sum(agent_stats.CallsHandled) as n_calls_handled,
CASE WHEN (Sum(agent_stats.TalkInTime) + Sum(agent_stats.IncomingCallsOnHoldTime) + Sum(agent_stats.WorkReadyTime)) = 0 THEN 1 ELSE
(Sum(agent_stats.TalkInTime) + Sum(agent_stats.IncomingCallsOnHoldTime) + Sum(agent_stats.WorkReadyTime)) END
AS total_handle_time
from tblAceyusAgntSklGrp as agent_stats
-- GET TARGETS
INNER JOIN tblCrosswalkWghtPhnEffTarget as aht_target
ON aht_target.SgId = agent_stats.SkillGroupSkillTargetID
AND agent_stats.DateTime BETWEEN aht_target.StartDt and aht_target.EndDt
-- GET RACF
INNER JOIN tblAgentMetricCrosswalk as xwalk
ON xwalk.SkillTargetID = agent_stats.SkillTargetID
--GET TAU DATA LIKE START DATE AND GRADUATED FLAG
INNER JOIN tblTauClassList AS T
ON T.SaRacf = racf
WHERE
--FILTERS BY A ROLLING 15 BUSINESS DAYS UNLESS THE DAYS BETWEEN CURRENT DATE AND TAU START DATE ARE <15
agent_stats.DateTime >=
CASE WHEN dbo.fn_WorkDaysAge(TauStart, GETDATE()) <15 THEN TauStart ELSE
dbo.fn_WorkDate15(TauStart)
END
And Graduated = 'No'
--WPE FILTERS TO ENSURE ACCURATE DATA
AND CallsHandled <> 0
AND Target is not null
Group By
racf, agent_stats.SkillGroupSkillTargetID, aht_target.EnterpriseName, aht_target.target
),
agent_split_stats_with_weight AS (
-- calculate weights
-- one row = one advocate + split
SELECT
agent_split_stats.*,
agent_split_stats.n_calls_handled/SUM(agent_split_stats.n_calls_handled) OVER(PARTITION BY agent_split_stats.racf) AS [weight]
FROM agent_split_stats
),
agent_split_effectiveness AS (
-- calculate the raw Effectiveness score for each eligible advocate/split
-- one row = one agent + split, with their raw Effectiveness score and the components of that
SELECT
agent_split_stats_with_weight.*,
-- these are the components of the Effectiveness score
(((agent_split_stats_with_weight.target * agent_split_stats_with_weight.n_calls_handled) / agent_split_stats_with_weight.total_handle_time)*100)*agent_split_stats_with_weight.weight AS effectiveness_sum
FROM agent_split_stats_with_weight
), -- this is where we show effectiveness per split select * from agent_split_effectiveness
agent_effectiveness AS (
-- sum all of the individual effectiveness raw scores for each agent to get each agent's raw score
SELECT
racf AS SaRacf,
ROUND(SUM(effectiveness_sum),2) AS WpeScore
FROM agent_split_effectiveness
GROUP BY racf
),
--GET FULL CLASS LIST, TAU DATES, GOALS FOR WHOLE CLASS
tau AS (
Select L.SaRacf, TauStart, Goal as WpeGoal
,CASE WHEN agent_effectiveness.WpeScore IS NULL THEN 1 ELSE WpeScore END as WpeScore
FROM tblTauClassList AS L
LEFT JOIN agent_effectiveness
ON agent_effectiveness.SaRacf = L.SaRacf
LEFT JOIN tblCrosswalkTauGoal AS G
ON G.Year = TauYear
AND G.Bucket = 'Wpe'
WHERE TermDate IS NULL
AND Graduated = 'No'
)
SELECT tau.*,
CASE WHEN dbo.fn_WorkDaysAge(TauStart, GETDATE()) > 14 --MUST BE AT LEAST 15 DAYS TO PASS
AND WpeScore >= WpeGoal THEN 'Pass'
ELSE 'Fail' END
from tau
这种查询风格在其他 3 种不同的计算类型(不同的分数类型)中运行良好。所以我不确定为什么它在这里失败得如此糟糕。实际结果应该是个人列表、日期、分数、目标和分数。当不存在分数时,将提供默认分数。此外,还有一个使用得分/目标的通过/失败指标。
解决方案
正如@Habo 提到的,我们需要实际的执行计划(例如,在打开“包括实际执行计划”的情况下运行查询。)我查看了您发布的内容,没有任何内容可以解释问题。实际计划与预估计划的区别在于记录了实际检索的行数;这对于解决性能不佳的查询至关重要。
也就是说,我确实看到这两个查询都存在巨大问题。这是一个问题,一旦修复,这两个查询将改进到不到一秒。您的查询利用了两个标量用户定义函数 (UDF):dbo.fn_WorkDaysAge 和 dbo.fn_WorkDate15。标量 UDF 毁了一切。它们不仅速度慢,而且还强制执行串行执行计划,这使得使用它们的任何查询都慢得多。
我没有 dbo.fn_WorkDaysAge 或 dbo.fn_WorkDate15 的代码 我有自己的内联“WorkDays”函数(代码如下)。语法略有不同,但性能优势值得付出努力。这是语法差异:
-- Scalar
SELECT d.*, workDays = dbo.countWorkDays_scalar(d.StartDate,d.EndDate)
FROM <sometable> AS d;
-- Inline version
SELECT d.*, f.workDays
FROM <sometable> AS d
CROSS APPLY dbo.countWorkDays(d.StartDate,d.EndDate) AS f;
这是我汇总的性能测试,以显示内联版本与标量版本之间的区别:
-- SAMPLE DATA
IF OBJECT_ID('tempdb..#dates') IS NOT NULL DROP TABLE #dates;
WITH E1(x) AS (SELECT 1 FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS x(x)),
E3(x) AS (SELECT 1 FROM E1 a, E1 b, E1 c),
iTally AS (SELECT N=ROW_NUMBER() OVER (ORDER BY (SELECT 1)) FROM E3 a, E3 b)
SELECT TOP (100000)
StartDate = CAST(DATEADD(DAY,-ABS(CHECKSUM(NEWID())%1000),GETDATE()) AS DATE),
EndDate = CAST(DATEADD(DAY,+ABS(CHECKSUM(NEWID())%1000),GETDATE()) AS DATE)
INTO #dates
FROM iTally;
-- PERFORMANCE TESTS
PRINT CHAR(10)+'Scalar Version (always serial):'+CHAR(10)+REPLICATE('-',60);
GO
DECLARE @st DATETIME = GETDATE(), @workdays INT;
SELECT @workdays = dbo.countWorkDays_scalar(d.StartDate,d.EndDate)
FROM #dates AS d;
PRINT DATEDIFF(MS,@st,GETDATE());
GO 3
PRINT CHAR(10)+'Inline Version:'+CHAR(10)+REPLICATE('-',60);
GO
DECLARE @st DATETIME = GETDATE(), @workdays INT;
SELECT @workdays = f.workDays
FROM #dates AS d
CROSS APPLY dbo.countWorkDays(d.StartDate,d.EndDate) AS f
PRINT DATEDIFF(MS,@st,GETDATE());
GO 3
结果:
Scalar Version (always serial):
------------------------------------------------------------
Beginning execution loop
380
363
350
Batch execution completed 3 times.
Inline Version:
------------------------------------------------------------
Beginning execution loop
47
47
46
Batch execution completed 3 times.
如您所见 - 内联版本比标量版本快 8 倍。无论连接类型如何,用内联版本替换这些标量 UDF 几乎肯定会加快此查询的速度。
我看到的其他问题包括:
我看到很多索引扫描,这表明您需要更多过滤和/或更好的索引。
dbo.tblCrosswalkWghtPhnEffTarget 没有任何索引,这意味着它将始终被扫描。
用于性能测试的函数:
-- INLINE VERSION
----------------------------------------------------------------------------------------------
IF OBJECT_ID('dbo.countWorkDays') IS NOT NULL DROP FUNCTION dbo.countWorkDays;
GO
CREATE FUNCTION dbo.countWorkDays (@startDate DATETIME, @endDate DATETIME)
/*****************************************************************************************
[Purpose]:
Calculates the number of business days between two dates (Mon-Fri) and excluded weekends.
dates.countWorkDays does not take holidays into considerations; for this you would need a
seperate "holiday table" to perform an antijoin against.
The idea is based on the solution in this article:
https://www.sqlservercentral.com/Forums/Topic153606.aspx?PageIndex=16
[Author]:
Alan Burstein
[Compatibility]:
SQL Server 2005+
[Syntax]:
--===== Autonomous
SELECT f.workDays
FROM dates.countWorkDays(@startdate, @enddate) AS f;
--===== Against a table using APPLY
SELECT t.col1, t.col2, f.workDays
FROM dbo.someTable t
CROSS APPLY dates.countWorkDays(t.col1, t.col2) AS f;
[Parameters]:
@startDate = datetime; first date to compare
@endDate = datetime; date to compare @startDate to
[Returns]:
Inline Table Valued Function returns:
workDays = int; number of work days between @startdate and @enddate
[Dependencies]:
N/A
[Developer Notes]:
1. NULL when either input parameter is NULL,
2. This function is what is referred to as an "inline" scalar UDF." Technically it's an
inline table valued function (iTVF) but performs the same task as a scalar valued user
defined function (UDF); the difference is that it requires the APPLY table operator
to accept column values as a parameter. For more about "inline" scalar UDFs see this
article by SQL MVP Jeff Moden: http://www.sqlservercentral.com/articles/T-SQL/91724/
and for more about how to use APPLY see the this article by SQL MVP Paul White:
http://www.sqlservercentral.com/articles/APPLY/69953/.
Note the above syntax example and usage examples below to better understand how to
use the function. Although the function is slightly more complicated to use than a
scalar UDF it will yield notably better performance for many reasons. For example,
unlike a scalar UDFs or multi-line table valued functions, the inline scalar UDF does
not restrict the query optimizer's ability generate a parallel query execution plan.
3. dates.countWorkDays requires that @enddate be equal to or later than @startDate. Otherwise
a NULL is returned.
4. dates.countWorkDays is NOT deterministic. For more deterministic functions see:
https://msdn.microsoft.com/en-us/library/ms178091.aspx
[Examples]:
--===== 1. Basic Use
SELECT f.workDays
FROM dates.countWorkDays('20180608', '20180611') AS f;
---------------------------------------------------------------------------------------
[Revision History]:
Rev 00 - 20180625 - Initial Creation - Alan Burstein
*****************************************************************************************/
RETURNS TABLE WITH SCHEMABINDING AS RETURN
SELECT workDays =
-- If @startDate or @endDate are NULL then rerturn a NULL
CASE WHEN SIGN(DATEDIFF(dd, @startDate, @endDate)) > -1 THEN
(DATEDIFF(dd, @startDate, @endDate) + 1) --total days including weekends
-(DATEDIFF(wk, @startDate, @endDate) * 2) --Subtact 2 days for each full weekend
-- Subtract 1 when startDate is Sunday and Substract 1 when endDate is Sunday:
-(CASE WHEN DATENAME(dw, @startDate) = 'Sunday' THEN 1 ELSE 0 END)
-(CASE WHEN DATENAME(dw, @endDate) = 'Saturday' THEN 1 ELSE 0 END)
END;
GO
-- SCALAR VERSION
----------------------------------------------------------------------------------------------
IF OBJECT_ID('dbo.countWorkDays_scalar') IS NOT NULL DROP FUNCTION dbo.countWorkDays_scalar;
GO
CREATE FUNCTION dbo.countWorkDays_scalar (@startDate DATETIME, @endDate DATETIME)
RETURNS INT WITH SCHEMABINDING AS
BEGIN
RETURN
(
SELECT workDays =
-- If @startDate or @endDate are NULL then rerturn a NULL
CASE WHEN SIGN(DATEDIFF(dd, @startDate, @endDate)) > -1 THEN
(DATEDIFF(dd, @startDate, @endDate) + 1) --total days including weekends
-(DATEDIFF(wk, @startDate, @endDate) * 2) --Subtact 2 days for each full weekend
-- Subtract 1 when startDate is Sunday and Substract 1 when endDate is Sunday:
-(CASE WHEN DATENAME(dw, @startDate) = 'Sunday' THEN 1 ELSE 0 END)
-(CASE WHEN DATENAME(dw, @endDate) = 'Saturday' THEN 1 ELSE 0 END)
END
);
END
GO
根据评论中 OP 的问题进行更新:
首先针对每个函数的内联表值函数版本。请注意,我使用的是自己的表,没有时间让名称与您的环境相匹配,但我已尽力在代码中包含注释。另请注意,如果在您的函数workingday = '1'
中只是拉工作日,那么您会发现我上面的函数是您的 dbo.fn_WorkDaysAge 函数的更快替代方案。如果workingday = '1'
还过滤掉假期,那么它将无法正常工作。
CREATE FUNCTION dbo.fn_WorkDaysAge_itvf
(
@first_date DATETIME,
@second_date DATETIME
)
RETURNS TABLE AS RETURN
SELECT WorkDays = COUNT(*)
FROM dbo.dimdate -- DateDimension
WHERE DateValue -- [date]
BETWEEN @first_date AND @second_date
AND IsWeekend = 0 --workingday = '1'
GO
CREATE FUNCTION dbo.fn_WorkDate15_itvf
(
@TauStartDate DATETIME
)
RETURNS TABLE AS RETURN
WITH DATES AS
(
SELECT
ROW_NUMBER() OVER(Order By DateValue Desc) as RowNum, DateValue
FROM dbo.dimdate -- DateDimension
WHERE DateValue BETWEEN @TauStartDate AND --GETDATE() testing below
CASE WHEN GETDATE() < @TauStartDate + 200 THEN GETDATE() ELSE @TauStartDate + 200 END
AND IsWeekend = 0 --workingday = '1'
)
--Get the 15th businessday from the current date
SELECT DateValue
FROM DATES
WHERE RowNum = 16;
GO
现在,要用内联表值函数替换标量 UDF,您可以这样做(注意我的评论):
WITH agent_split_stats AS (
Select
racf,
agent_stats.SkillGroupSkillTargetID,
aht_target.EnterpriseName,
aht_target.target,
Sum(agent_stats.CallsHandled) as n_calls_handled,
CASE WHEN (Sum(agent_stats.TalkInTime) + Sum(agent_stats.IncomingCallsOnHoldTime) + Sum(agent_stats.WorkReadyTime)) = 0 THEN 1 ELSE
(Sum(agent_stats.TalkInTime) + Sum(agent_stats.IncomingCallsOnHoldTime) + Sum(agent_stats.WorkReadyTime)) END
AS total_handle_time
from tblAceyusAgntSklGrp as agent_stats
INNER JOIN tblCrosswalkWghtPhnEffTarget as aht_target
ON aht_target.SgId = agent_stats.SkillGroupSkillTargetID
AND agent_stats.DateTime BETWEEN aht_target.StartDt and aht_target.EndDt
INNER JOIN tblAgentMetricCrosswalk as xwalk
ON xwalk.SkillTargetID = agent_stats.SkillTargetID
INNER JOIN tblTauClassList AS T
ON T.SaRacf = racf
-- INLINE FUNCTIONS HERE:
CROSS APPLY dbo.fn_WorkDaysAge_itvf(TauStart, GETDATE()) AS wd
CROSS APPLY dbo.fn_WorkDate15_itvf(TauStart) AS w15
-- NEW WHERE CLAUSE:
WHERE agent_stats.DateTime >=
CASE WHEN wd.workdays < 15 THEN TauStart ELSE w15.workdays END
And Graduated = 'No'
AND CallsHandled <> 0
AND Target is not null
Group By
racf, agent_stats.SkillGroupSkillTargetID, aht_target.EnterpriseName, aht_target.target
),
agent_split_stats_with_weight AS (
SELECT
agent_split_stats.*,
agent_split_stats.n_calls_handled/SUM(agent_split_stats.n_calls_handled) OVER(PARTITION BY agent_split_stats.racf) AS [weight]
FROM agent_split_stats
),
agent_split_effectiveness AS
(
SELECT
agent_split_stats_with_weight.*,
(((agent_split_stats_with_weight.target * agent_split_stats_with_weight.n_calls_handled) /
agent_split_stats_with_weight.total_handle_time)*100)*
agent_split_stats_with_weight.weight AS effectiveness_sum
FROM agent_split_stats_with_weight
),
agent_effectiveness AS
(
SELECT
racf AS SaRacf,
ROUND(SUM(effectiveness_sum),2) AS WpeScore
FROM agent_split_effectiveness
GROUP BY racf
),
tau AS
(
SELECT L.SaRacf, TauStart, Goal as WpeGoal
,CASE WHEN agent_effectiveness.WpeScore IS NULL THEN 1 ELSE WpeScore END as WpeScore
FROM tblTauClassList AS L
LEFT JOIN agent_effectiveness
ON agent_effectiveness.SaRacf = L.SaRacf
LEFT JOIN tblCrosswalkTauGoal AS G
ON G.Year = TauYear
AND G.Bucket = 'Wpe'
WHERE TermDate IS NULL
AND Graduated = 'No'
)
SELECT tau.*,
-- NEW CASE STATEMENT HERE:
CASE WHEN wd.workdays > 14 AND WpeScore >= WpeGoal THEN 'Pass' ELSE 'Fail' END
from tau
-- INLINE FUNCTIONS HERE:
CROSS APPLY dbo.fn_WorkDaysAge_itvf(TauStart, GETDATE()) AS wd
CROSS APPLY dbo.fn_WorkDate15_itvf(TauStart) AS w15;
请注意,我现在无法对此进行测试,但它应该是正确的(或关闭)
推荐阅读
- mongodb - 在 mongodb 中创建一个字段为 max + 1 的文档
- html - 以全宽显示元素的内容而不重叠以下元素
- javascript - 数据表不响应提供的 json 结果来自 api 调用以用数据填充表
- python - ValueError: y 应该是一维数组,得到一个形状为 (1, 375) 的数组
- amazon-ec2 - EC2和ECS使用同一个安全组通信时发生超时错误
- c++ - 当我使用独立 COM 时,如何修复编译器错误“无效的命令行使用”
- c# - Excel VSTO 无法将按钮添加到工作表
- typescript - 键入 N 个数字的元组,后跟单个字符串
- java - Java二叉树:递归相关问题
- java - Mockito,我在控制器中测试 Post 方法做错了什么?