mysql - MariaDB 中选择额外列时 SQL 响应缓慢
问题描述
注意以下两个查询,它们之间的区别被突出显示
查询一:
SELECT tasks.TaskID, tasks.CardID, tasks.CritPath, tasks.ReworkCount,
cs.WorkflowID,
cs.StageCode, -- This is added
tasks.CurrentEscalationLevel,
tasks.Title, tasks.Description, tasks.EscalationDelay,
tasks.StartDate, tasks.EndDate, tasks.OriginalStartDate,
tasks.OriginalEndDate, tasks.Priority, tasks.Duration,
tasks.Status
FROM Tasks tasks
INNER JOIN CardsSettings cs ON cs.CardID = tasks.TaskID
INNER JOIN
(
SELECT t1.WorkflowID
FROM
(
SELECT WorkflowID
from Workflow
Where IsWFActive = "YES"
and LastUpdatedDateTime BETWEEN "2018-11-21 23:59:59" AND "2019-11-21 23:59:59"
AND WorkflowTypeID = 9
) t1
INNER JOIN
(
SELECT formSubCardSettings.WorkflowID, cfsm.Value as
Value
FROM CardsSettings AS formSubCardSettings
INNER JOIN custom_form_submissions cfs ON cfs.FormSubmissionID = formSubCardSettings.CardID
AND cfs.IsHistory = 'NO'
INNER JOIN custom_form cf ON cf.FormID = cfs.FormID
INNER JOIN custom_form_metadata cfm ON cfm.FormID = cf.FormID
INNER JOIN custom_form_submissions_metadata cfsm ON cfsm.FormSubmissionID = cfs.FormSubmissionID
AND cfsm.FormMetaID = cfm.FormMetaID
INNER JOIN Workflow cfwf ON cfwf.WorkflowID = formSubCardSettings.WorkflowID
WHERE cf.FormTitle = "Project Initiation"
AND cfm.FieldLabel = "wid"
AND cfwf.WorkflowTypeID = 9
AND (cfsm.Value IN("413"))) t2 ON t1.WorkflowID = t2.WorkflowID
) a ON a.WorkflowID = cs.WorkflowID;
查询 B:
选择 tasks.TaskID、tasks.CardID、tasks.CritPath、tasks.ReworkCount、cs.WorkflowID、tasks.CurrentEscalationLevel、tasks.Title、tasks.Description、tasks.EscalationDelay、tasks.StartDate、tasks.EndDate、tasks.OriginalStartDate、tasks .OriginalEndDate,tasks.Priority,tasks.Duration,tasks.Status FROM Tasks tasks INNER JOIN CardsSettings cs ON cs.CardID = tasks.TaskID INNER JOIN ( SELECT t1.WorkflowID FROM (SELECT WorkflowID from Workflow Where IsWFActive = "YES" and LastUpdatedDateTime在“2018-11-21 23:59:59”和“2019-11-21 23:59:59”之间且 WorkflowTypeID = 9) t1 INNER JOIN (SELECT formSubCardSettings.WorkflowID, cfsm.Value as Value FROM CardsSettings AS formSubCardSettings INNER加入 custom_form_submissions cfs ON cfs.FormSubmissionID = formSubCardSettings.CardID AND cfs。IsHistory = 'NO' INNER JOIN custom_form cf ON cf.FormID = cfs.FormID INNER JOIN custom_form_metadata cfm ON cfm.FormID = cf.FormID INNER JOIN custom_form_submissions_metadata cfsm ON cfsm.FormSubmissionID = cfs.FormSubmissionID AND cfsm.FormMetaID = cfm.FormMetaID INNER加入工作流 cfwf ON cfwf.WorkflowID = formSubCardSettings.WorkflowID WHERE cf.FormTitle = "Project Initiation" AND cfm.FieldLabel = "wid" AND cfwf.WorkflowTypeID = 9 AND (cfsm.Value IN("413"))) t2 ON t1 .WorkflowID = t2.WorkflowID) a ON a.WorkflowID = cs.WorkflowID;FormMetaID = cfm.FormMetaID INNER JOIN 工作流 cfwf ON cfwf.WorkflowID = formSubCardSettings.WorkflowID WHERE cf.FormTitle = "项目启动" AND cfm.FieldLabel = "wid" AND cfwf.WorkflowTypeID = 9 AND (cfsm.Value IN("413" ))) t2 ON t1.WorkflowID = t2.WorkflowID) a ON a.WorkflowID = cs.WorkflowID;FormMetaID = cfm.FormMetaID INNER JOIN 工作流 cfwf ON cfwf.WorkflowID = formSubCardSettings.WorkflowID WHERE cf.FormTitle = "项目启动" AND cfm.FieldLabel = "wid" AND cfwf.WorkflowTypeID = 9 AND (cfsm.Value IN("413" ))) t2 ON t1.WorkflowID = t2.WorkflowID) a ON a.WorkflowID = cs.WorkflowID;
StageCode列是 varchar(1024)。
我在两个独立的数据库引擎上有完全相同的数据库和架构,一个是 mysql (5.6),另一个是 mariadb (10.4)
在 mariadb 上运行查询会导致非常不同的响应时间:
查询 A:大约 5 秒
查询 B:约 0.2 秒
只需从 cs 添加一个额外的字段,响应时间就会增加很多倍。这不是一次性行为,因为它每次都会发生。另请注意,两个查询都选择了 cs.WorkflowID。
在 mysql 上运行上述查询会产生快速响应,两者都在不到 0.1 秒内返回。
两个数据库引擎上有问题的查询 A 的解释如下:
mysql:
+----+-------------+---------------------+--------+----------------------------+------------------+---------+------------------------------------------------------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------------+--------+----------------------------+------------------+---------+------------------------------------------------------+-------+-------------+
| 1 | PRIMARY | tasks | ALL | PRIMARY | NULL | NULL | NULL | 3091 | NULL |
| 1 | PRIMARY | cs | eq_ref | PRIMARY,FK_CS_WFID_WF_WFID | PRIMARY | 4 | zestlTitan_3000037868.tasks.TaskID | 1 | Using where |
| 1 | PRIMARY | <derived2> | ref | <auto_key0> | <auto_key0> | 4 | zestlTitan_3000037868.cs.WorkflowID | 10 | Using index |
| 2 | DERIVED | <derived3> | ALL | NULL | NULL | NULL | NULL | 81 | NULL |
| 2 | DERIVED | <derived4> | ref | <auto_key0> | <auto_key0> | 5 | t1.WorkflowID | 715 | NULL |
| 4 | DERIVED | cfs | ALL | PRIMARY,FormID | NULL | NULL | NULL | 17888 | Using where |
| 4 | DERIVED | cf | eq_ref | PRIMARY | PRIMARY | 4 | zestlTitan_3000037868.cfs.FormID | 1 | Using where |
| 4 | DERIVED | formSubCardSettings | eq_ref | PRIMARY,FK_CS_WFID_WF_WFID | PRIMARY | 4 | zestlTitan_3000037868.cfs.FormSubmissionID | 1 | Using where |
| 4 | DERIVED | cfwf | eq_ref | PRIMARY,FK_WTID_WFT_WTID | PRIMARY | 4 | zestlTitan_3000037868.formSubCardSettings.WorkflowID | 1 | Using where |
| 4 | DERIVED | cfsm | ref | FormSubmissionID | FormSubmissionID | 4 | zestlTitan_3000037868.cfs.FormSubmissionID | 4 | Using where |
| 4 | DERIVED | cfm | eq_ref | PRIMARY,FormID | PRIMARY | 4 | zestlTitan_3000037868.cfsm.FormMetaID | 1 | Using where |
| 3 | DERIVED | Workflow | ref | FK_WTID_WFT_WTID | FK_WTID_WFT_WTID | 4 | const | 81 | Using where |
+----+-------------+---------------------+--------+----------------------------+------------------+---------+------------------------------------------------------+-------+-------------+
12 rows in set (0.00 sec)
在 mariadb 上:
+------+-------------+---------------------+--------+----------------------------+--------------------+---------+--------------------------------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+---------------------+--------+----------------------------+--------------------+---------+--------------------------------------------------+------+-------------+
| 1 | SIMPLE | tasks | ALL | PRIMARY | NULL | NULL | NULL | 3072 | |
| 1 | SIMPLE | cs | eq_ref | PRIMARY,FK_CS_WFID_WF_WFID | PRIMARY | 4 | zestlTitan_3000037868.tasks.TaskID | 1 | Using where |
| 1 | SIMPLE | Workflow | eq_ref | PRIMARY,FK_WTID_WFT_WTID | PRIMARY | 4 | zestlTitan_3000037868.cs.WorkflowID | 1 | Using where |
| 1 | SIMPLE | cfwf | eq_ref | PRIMARY,FK_WTID_WFT_WTID | PRIMARY | 4 | zestlTitan_3000037868.cs.WorkflowID | 1 | Using where |
| 1 | SIMPLE | formSubCardSettings | ref | PRIMARY,FK_CS_WFID_WF_WFID | FK_CS_WFID_WF_WFID | 5 | zestlTitan_3000037868.cs.WorkflowID | 219 | Using index |
| 1 | SIMPLE | cfs | eq_ref | PRIMARY,FormID | PRIMARY | 4 | zestlTitan_3000037868.formSubCardSettings.CardID | 1 | Using where |
| 1 | SIMPLE | cf | eq_ref | PRIMARY | PRIMARY | 4 | zestlTitan_3000037868.cfs.FormID | 1 | Using where |
| 1 | SIMPLE | cfsm | ref | FormSubmissionID | FormSubmissionID | 4 | zestlTitan_3000037868.formSubCardSettings.CardID | 4 | Using where |
| 1 | SIMPLE | cfm | eq_ref | PRIMARY,FormID | PRIMARY | 4 | zestlTitan_3000037868.cfsm.FormMetaID | 1 | Using where |
+------+-------------+---------------------+--------+----------------------------+--------------------+---------+--------------------------------------------------+------+-------------+
9 rows in set (0.002 sec)
有人可以确定出了什么问题吗?两台机器都有相似的资源(RAM/磁盘)
编辑1:
如果我使用另一个 CardsSettings(cs) 字段而不是 StageCode(Varchar 1024),则会出现相同的奇怪行为,该字段是 tinyint 或 int(10)。所以我怀疑这与长列有关。但是,CardsSettings 表中有几个 Text 列,还有几个 varchar(1024) 和一个 varchar(4096)。
编辑2:
单独查询 t2 的子查询仅需 0.05 秒:
SELECT formSubCardSettings.WorkflowID FROM CardsSettings AS formSubCardSettings INNER JOIN custom_form_submissions cfs ON cfs.FormSubmissionID = formSubCardSettings.CardID AND cfs.IsHistory = 'NO' INNER JOIN custom_form cf ON cf.FormID = cfs.FormID INNER JOIN custom_form_metadata cfm ON cfm.FormID = cf.FormID INNER JOIN custom_form_submissions_metadata cfsm ON cfsm.FormSubmissionID = cfs.FormSubmissionID AND cfsm.FormMetaID = cfm.FormMetaID INNER JOIN Workflow cfwf ON cfwf.WorkflowID = formSubCardSettings.WorkflowID WHERE cf.FormTitle = "Project Initiation" AND cfm.FieldLabel = "wid" AND cfwf.WorkflowTypeID = 9 AND (cfsm.Value IN("413"));
+------------+
| WorkflowID |
+------------+
| 413 |
+------------+
1 row in set (0.056 sec)
如果我只是在原始查询“A”中替换整个子查询,那么查询 A 变为:
MariaDB [zestlTitan_3000037868]> 选择 tasks.TaskID,tasks.CardID,tasks.CritPath,tasks.ReworkCount,cs.WorkflowID,cs.StageCode,tasks.CurrentEscalationLevel,tasks.Title,tasks.Description,tasks.EscalationDelay,tasks.StartDate, tasks.EndDate,tasks.OriginalStartDate,tasks.OriginalEndDate,tasks.Priority,tasks.Duration,tasks.Status FROM Tasks 任务 INNER JOIN CardsSettings cs ON cs.CardID = tasks.TaskID INNER JOIN ( SELECT t1.WorkflowID FROM (SELECT WorkflowID from IsWFActive = "YES" and LastUpdatedDateTime BETWEEN "2018-11-21 23:59:59" and "2019-11-21 23:59:59" AND WorkflowTypeID = 9) t1 INNER JOIN ( SELECT "413" as WorkflowID ) t2 ON t1.WorkflowID = t2.WorkflowID) a ON a.WorkflowID = cs.WorkflowID;
这现在在 mariadb 上运行得非常快(~0.1 秒)。
编辑 3:
编辑 2 给了我一些关于为什么会发生这种情况的线索,显然子查询 t2 没有被先验计算(我可能错了)。所以我修改了查询以使用 WHERE t1.WorkflowID IN (...)t2 而不是 INNER JOIN,如下所示:
选择tasks.TaskID,tasks.CardID,tasks.CritPath,tasks.ReworkCount,cs.WorkflowID,cs.StageCode,tasks.CurrentEscalationLevel,tasks.Title,tasks.Description,tasks.EscalationDelay,tasks.StartDate,tasks.EndDate,tasks .OriginalStartDate,tasks.OriginalEndDate,tasks.Priority,tasks.Duration,tasks.Status FROM Tasks 任务 INNER JOIN CardsSettings cs ON cs.CardID = tasks.TaskID INNER JOIN ( SELECT t1.WorkflowID FROM (SELECT WorkflowID from Workflow Where IsWFActive = " YES" 和 LastUpdatedDateTime BETWEEN "2018-11-21 23:59:59" 和 "2019-11-21 23:59:59" AND WorkflowTypeID = 9) t1 WHERE t1.WorkflowID IN (从 CardsSettings 中选择 formSubCardSettings.WorkflowID 作为 formSubCardSettings INNER JOIN custom_form_submissions cfs ON cfs.FormSubmissionID = formSubCardSettings.CardID AND cfs.IsHistory = 'NO' INNER JOIN custom_form cf ON cf.FormID = cfs.FormID INNER JOIN custom_form_metadata cfm ON cfm.FormID = cf .FormID INNER JOIN custom_form_submissions_metadata cfsm ON cfsm.FormSubmissionID = cfs.FormSubmissionID AND cfsm.FormMetaID = cfm.FormMetaID INNER JOIN Workflow cfwf ON cfwf.WorkflowID = formSubCardSettings.WorkflowID WHERE cf.FormTitle = "Project Initiation" AND cfm.FieldLabel = "wid " AND cfwf.WorkflowTypeID = 9 AND (cfsm.Value IN("413")) ) a ON a.WorkflowID = cs.WorkflowID;
现在,mariadb 查询也可以快速运行,大约 0.2 秒。
虽然通过这次尝试和尝试,我已经设法使查询快速响应,但有人可以尝试从整件事中弄清楚吗?
解决方案
当一行“太大”时,一些列的值被写入“非记录”存储。重新读取行 ( SELECTing
) 需要从其他地方获取。如果数据没有缓存在 RAM 中,那么这是另一个磁盘命中。磁盘命中需要时间。
听起来表格有很多列?很多TEXT
还是很大VARCHARs
?这可以解释第一个查询的迟缓。
MySQL 和 MariaDB 的优化器存在分歧的地方之一是处理子查询。
额外的列还有两个可能的问题......
- 它作为查询的第二个(?)表被提取,并且需要在查询的其余部分被拖拉。
- 如果需要一个中间临时表(不能从 中看出
EXPLAIN
),它可能会被强制为 MyISAM,而不是 MEMORY。这是因为大VARCHAR(1024)
,即使实际数据很小。MyISAM 比 MEMORY 慢。(如果其他一些列是“大”,那么 MyISAM 已经是必需的;所以这个答案不适用。)
这个结构特别棘手,并且处理方式不同:
SELECT ...
FROM ( SELECT ... )
JOIN ( SELECT ... )
MySQL 经常具体化派生表并动态决定它需要什么索引并构建该索引。见<auto-key>
中EXPLAIN
。
MariaDB 以不同的方式处理子查询。
可能有一些索引会有所帮助。
Where IsWFActive = "YES"
and LastUpdatedDateTime BETWEEN "2018-11-21 23:59:59" AND "2019-11-21 23:59:59"
AND WorkflowTypeID = 9
需要
INDEX(IsWFActive, WorkflowTypeID, -- in either order
LastUpdatedDateTime) -- after the others
和:
ON a.WorkflowID = cs.WorkflowID
ON cs.CardID = tasks.TaskID
需要这个,以便它可以从第二部分开始,然后回到第一个表:
CardsSettings: INDEX(WorkflowID)
Tasks: INDEX(TaskID) -- unless that is the PRIMARY KEY
最后一个建议可能会消除我之前提到的“大柱子的拖拉”。
更多的
这应该使查询在所有(?)情况下运行得更快:
INDEX(WorkflowID, -- first
CardID, StageCode) -- to make it covering (with or without stage
请注意,它会鼓励优化器首先放弃派生表(这通常是最佳的),然后移至cs
via WorkflowID
。最后,包括所需的其余列(“覆盖”)。
如需进一步讨论,请提供SHOW CREATE TABLE
.
推荐阅读
- php - 是否有 RC4 的版本或修改会生成可安全用作 cookie 值的加密字符串
- java - 生产者 - 使用信号量的消费者实现
- hive - Hive UDF 仅适用于独立的 select 语句,而不适用于“将表创建为 select...”或“insert into .. select..”
- python - 如何将 KerasClassifier、Hyperopt 和 Sklearn 交叉验证放在一起
- javascript - 错误:ENOENT,当我使用 nodejs 下载 .jpg 时没有这样的文件或目录
- regex - Lua字符串模式检查字符是否不在单词中
- laravel-5 - 如何在导入所有现有表名时排除表名
- keras - 非常好的验证准确性,但预测不好
- reactjs - 如何从实时数据库 Firebase 中获取非字符串值的值
- python - 如何从外部脚本更新 haystack 的索引?