sql - DB2 中 WITH 查询的 SQL 查询性能改进
问题描述
我在示例中给出的查询运行速度非常慢。我已经在my_task
表中关闭了 400 万条记录。
我们可以对此进行任何形式的性能改进吗?
以下表为例,
在这里,我放置了数字start_dt
而end_dt
不是放置timestamp
格式。
有空的附加说明end_dt
意味着它是一个活动记录并且正在由工人处理。
T_ID |start_dt |end_dt |code |p_id
-----|---------|-------|-----------|---
1 |8 |4 |INPROGRESS |110
1 |4 | |ASSIGNED |110
4 |10 |4 |INPROGRESS |110
4 |4 | |ASSIGNED |110
5 |4 |4 |INPROGRESS |110
6 |12 |12 |INPROGRESS |110
6 |8 |8 |ASSIGNED |110
6 |8 | |DONE |110
2 |12 |12 |INPROGRESS |210
2 |8 |8 |ASSIGNED |210
2 |8 | |DONE |210
3 |12 |12 |INPROGRESS |111
输出看起来像,
P_ID |avg_bgn_diff |assigned |in_progress |completed | comp_diff
-----|-------------|---------|------------|----------|----------
110 | 4 | 2 | 1 | 1 | 10
210 | null | 0 | 0 | 1 | 8
111 | null | 0 | 1 | 0 | null
输出解释:我已经用虚构的名称掩盖了原始查询表 ref 可以被破坏我为此提前道歉。
- MY_TASK 表有唯一的 T_ID
- MY_PEOPLE 表是员工表
- MY_TASK_REF 表包含有关谁有什么任务的详细信息
- TASK 具有状态,因为每个状态更改操作都会导致在任务表中创建记录。雕像,例如 ASSIGNED、INPROGRESS 和 DONE
- 现在,END_DT 不存在的地方代表活动记录
avg_bgn_diff
我们只想找到所有(平均 END_DT 为空)“分配”任务的平均时间的第一个输出字段- 此输出字段
assigned |in_progress |completed
表示每个员工在每个类别中有多少活动任务。 - 找出每个员工的平均
comp_diff
完成时间。当记录进入 INPROGRESS 时,员工开始工作。我们今天完成了状态为 DONE 的任务的平均值。我们得到 INPROGRESS 的开始日期和 DONE 的开始日期。
我有以下查询,
WITH a AS (
SELECT
t1.t_id AS t_id,
t1.start_dt AS start_dt,
t1.end_dt AS end_dt,
t1.code AS code,
t2.p_id AS p_id
FROM
my_task t2
INNER JOIN my_task_ref t1 ON t1.t_id = t2.t_id
INNER JOIN my_people p1 ON t2.p_id = p1.p_id
WHERE
-- ignore DONE tasks
t1.t_id NOT IN (
SELECT t.t_id
FROM my_task t
WHERE t.code = 'DONE' AND trunc(t.execution_dt) < trunc(current_timestamp)
)
and p1.department_id = '1234'
ORDER BY p_id DESC
) SELECT
d.p_id,
d.avg_bgn_diff
,e.assigned
,e.in_progress
,e.completed
,g.comp_diff
FROM
`-- find average time for persons for diff ASSIGNMENT
(
SELECT c.p_id,AVG(c.bgn_diff) AS avg_bgn_diff
FROM(
SELECT b.p_id,timestampdiff(4,current_timestamp - a.start_dt) AS bgn_diff
FROM ( SELECT p_id,t_id,start_dt FROM a WHERE end_dt IS NULL ) b
LEFT OUTER JOIN ( SELECT p_id, t_id,start_dt FROM a WHERE
code = 'ASSIGNED' AND end_dt IS NULL ) x ON x.p_id = b.p_id
) c GROUP BY C.p_id
) d
-- find count of each codes person has
INNER JOIN (
SELECT
p_id,
SUM( CASE WHEN code = 'ASSIGNED' THEN 1 ELSE 0 END ) AS assigned,
SUM( CASE WHEN code = 'INPROGRESS' THEN 1 ELSE 0 END ) AS in_progress,
SUM( CASE WHEN code = 'DONE' AND trunc(start_dt) = trunc(current_timestamp)
THEN 1 ELSE 0 END ) AS completed
FROM
a where end_dt IS NULL
GROUP BY p_id
) e on D.p_id=E.p_id
-- find total avg diff of entire task took to compelete.
LEFT OUTER JOIN (
SELECT F.p_id,AVG(f.bgn_diff) AS comp_diff
FROM
(
SELECT a.p_id, timestampdiff(4,b.start_dt - a.start_dt) AS bgn_diff
FROM (
SELECT p_id, t_id, start_dt FROM a WHERE code = 'INPROGRESS'
) a
INNER JOIN (
SELECT p_id, t_id, start_dt FROM a
WHERE code = 'DONE' AND trunc(start_dt) = trunc(current_timestamp)
) b ON a.t_id = b.t_id
) f GROUP BY F.p_id
) g ON D.p_id=G.p_id
WITH
ur;
我们可以用不同的方式编写它来提高性能吗?
注意:索引存在于所有必要的列中。
提前致谢。
解决方案
如果您提供一个查询EXPLAIN
计划、一个索引列表,或许可以更好地解释您正在尝试做什么(并更正表引用的语法错误c
),我们当然可以做得更好,但是这个版本的查询可能能够加快速度。
请注意整个评论!
WITH Incomplete_Task AS (SELECT My_Task_Ref.t_id,
My_Task_Ref.start_dt, My_Task_Ref.end_dt,
My_Task_Ref.code,
Task_A.p_id
FROM My_Task AS Task_A
JOIN My_Task_Ref
ON My_Task_Ref.t_id = Task_A.t_id
JOIN My_People
ON My_People.p_id = My_Task_Ref.p_id
AND My_People.department_id = '1234'
-- NOT IN should be fine, I just default to NOT EXISTS
WHERE NOT EXISTS (SELECT 1
FROM My_Task AS Task_B
WHERE Task_B.t_id = Task_A.t_id
AND Task_B.code = 'DONE'
-- Calling a function on a column can
-- cause indices to be ignored
AND Task_B.execution_dt < TIMESTAMP(CURRENT_DATE)))
SELECT Average_Time_And_Code_Count.p_id,
Average_Time_And_Code_Count.average_begin_difference,
COALESCE(Average_Time_And_Code_Count.assigned, 0),
COALESCE(Average_Time_And_Code_Count.in_progress, 0),
COALESCE(Average_Time_And_Code_Count.completed, 0),
Average_Complete_Time.average_complete_difference
FROM (SELECT p_id,
-- The join you had previously was almost certainly duplicating
-- some rows, distorting the results.
AVG(CASE WHEN code = 'ASSIGNED'
-- TIMESTAMPDIFF works off an estimate, and will be wrong
-- if a task takes more than a month.
THEN TIMESTAMPDIFF(4, CURRENT_TIMESTAMP - A.start_dt) END) AS average_begin_difference,
SUM(CASE WHEN code = 'ASSIGNED'
THEN 1 END) AS assigned,
SUM(CASE WHEN code = 'INPROGRESS'
THEN 1 END) AS in_progress,
SUM(CASE WHEN code = 'DONE'
AND start_dt >= TIMESTAMP(CURRENT_DATE)
THEN 1 END) AS completed
FROM Filtered_Task
WHERE end_dt IS NULL
GROUP BY p_id) AS Average_Time_And_Code_Count
-- I'm not convinced this measures what you think it does,
-- but I'm not sure what it is you think you _are_ measuring....
LEFT JOIN (SELECT p_id, TIMESTAMPDIFF(4, Done.start_dt - InProgress.start_dt) AS average_complete_difference
FROM Filtered_Task AS InProgress
JOIN Filtered_Task AS Done
ON InProgress.t_id = Done.t_id
AND Done.code = 'DONE'
AND Done.start_dt >= TIMESTAMP(CURRENT_DATE)
WHERE InProgress.code = 'INPROGRESS') AS Average_Complete_Time
ON Average_Complete_Time.p_id = Averate_Time_And_Code_Count.p_id
推荐阅读
- javascript - 如何将放置在reactjs中的json对象内的图像发布到spring boot后端rest api
- python - 使用 scikit learn 在 python 中进行管道和交叉验证
- .net - 如何自动启动使用 Msix 打包的应用程序?
- key-value - etcd 是否将其 key-value 数据直接存储在 Raft 日志中?
- c# - c#脚本Roslyn会话不接受参考
- javascript - 尝试在生产中的反应应用程序中添加自定义 HTML 和 JS 代码
- jpa - 是否需要停止 Guice Persist Service?
- c++ - 坏数组新长度错误未处理异常
- wpf - 实体模型分离和构造器逻辑
- c++ - 从 C 定义的接口编写 C++ DLL