mysql - MySQL:BTREE 慢速索引路径(表大小:723704015 行)
问题描述
我有一个大型数据库,特别是一个表一直很慢,尽管列上有一个适当的索引(实际上是 fk)。公平地说,该列的基数较低(只有 5 个可能的值,并且分布不均),但执行 aLIMIT 1
仍然需要 9+ 秒,但仅限于在 btree 的特定路径上进行选择时。
这是表格:
CREATE TABLE `locking_scripts` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`script_type_id` int(10) unsigned NOT NULL DEFAULT 1,
`transaction_output_id` int(10) unsigned NOT NULL,
`script` blob NOT NULL,
`address_id` int(10) unsigned DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `locking_scripts_uq` (`transaction_output_id`),
KEY `locking_scripts_address_id_fk` (`address_id`),
KEY `locking_scripts_type_id_fk` (`script_type_id`),
CONSTRAINT `locking_scripts_address_id_fk` FOREIGN KEY (`address_id`) REFERENCES `addresses` (`id`),
CONSTRAINT `locking_scripts_output_id_fk` FOREIGN KEY (`transaction_output_id`) REFERENCES `transaction_outputs` (`id`),
CONSTRAINT `locking_scripts_type_id_fk` FOREIGN KEY (`script_type_id`) REFERENCES `script_types` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=748705501 DEFAULT CHARSET=utf8mb4;
有问题的列是script_type_id
。该对应表的内容为:
+----+------------------------+
| id | type |
+----+------------------------+
| 2 | CUSTOM_SCRIPT |
| 3 | PAY_TO_PUBLIC_KEY |
| 4 | PAY_TO_PUBLIC_KEY_HASH |
| 5 | PAY_TO_SCRIPT_HASH |
| 1 | UNKNOWN |
+----+------------------------+
两个表的Atable status
是:
+-------------------+--------+---------+------------+-----------+----------------+--------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+--------------------+----------+----------------+---------+
| Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time | Update_time | Check_time | Collation | Checksum | Create_options | Comment |
+-------------------+--------+---------+------------+-----------+----------------+--------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+--------------------+----------+----------------+---------+
| locking_scripts | InnoDB | 10 | Dynamic | 726718877 | 70 | 51335135232 | 0 | 34351300608 | 7340032 | 748705501 | 2018-10-25 01:31:20 | 2018-11-18 15:30:40 | NULL | utf8mb4_general_ci | NULL | | |
| script_types | InnoDB | 10 | Dynamic | 5 | 3276 | 16384 | 0 | 16384 | 0 | 6 | 2018-10-24 22:22:43 | NULL | NULL | utf8mb4_general_ci | NULL | | |
+-------------------+--------+---------+------------+-----------+----------------+--------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+--------------------+----------+----------------+---------+
我正在运行的查询是:
SELECT id FROM locking_scripts WHERE script_type_id = 1 LIMIT 1;
...这需要9.5s
执行。此查询的配置文件是:
+------+-------------+-----------------+------+----------------------------+----------------------------+---------+-------+-----------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+------+-------------+-----------------+------+----------------------------+----------------------------+---------+-------+-----------+----------+-------------+
| 1 | SIMPLE | locking_scripts | ref | locking_scripts_type_id_fk | locking_scripts_type_id_fk | 4 | const | 363359438 | 100.00 | Using index |
+------+-------------+-----------------+------+----------------------------+----------------------------+---------+-------+-----------+----------+-------------+
分析器说它正在使用索引(公平地说,它必须过滤 363,359,438 行,但LIMIT 1
它应该很快,因为它会在第一次匹配时退出)。然而,有趣的是这个查询速度是原来的两倍:
SELECT id FROM locking_scripts WHERE script_type_id NOT IN (2, 3, 4, 5) LIMIT 1
...这需要4.5s
执行。(虽然这仍然是一个非常长的时间。)我有其他类似大小的表格,并且LIMIT 1
从类似索引的集合中选择一个几乎是即时的。
为了完整起见,“更快”版本的解释如下:
+------+-------------+-----------------+-------+----------------------------+----------------------------+---------+------+-----------+----------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+------+-------------+-----------------+-------+----------------------------+----------------------------+---------+------+-----------+----------+--------------------------+
| 1 | SIMPLE | locking_scripts | range | locking_scripts_type_id_fk | locking_scripts_type_id_fk | 4 | NULL | 363359442 | 100.00 | Using where; Using index |
+------+-------------+-----------------+-------+----------------------------+----------------------------+---------+------+-----------+----------+--------------------------+
我已经没有想法来解释为什么我会看到这种结果。如果有人有任何见解,我将不胜感激。谢谢。
附加信息,根据评论:
执行计划SELECT *
:
EXPLAIN FORMAT=JSON SELECT * FROM locking_scripts WHERE script_type_id = 1 LIMIT 1;
{
"query_block": {
"select_id": 1,
"table": {
"table_name": "locking_scripts",
"access_type": "ref",
"possible_keys": ["locking_scripts_type_id_fk"],
"key": "locking_scripts_type_id_fk",
"key_length": "4",
"used_key_parts": ["script_type_id"],
"ref": ["const"],
"rows": 363359438,
"filtered": 100
}
}
}
执行计划SELECT * ... NOT IN ()
:
EXPLAIN FORMAT=JSON SELECT * FROM locking_scripts WHERE script_type_id NOT IN (2, 3, 4, 5) LIMIT 1;
{
"query_block": {
"select_id": 1,
"table": {
"table_name": "locking_scripts",
"access_type": "range",
"possible_keys": ["locking_scripts_type_id_fk"],
"key": "locking_scripts_type_id_fk",
"key_length": "4",
"used_key_parts": ["script_type_id"],
"rows": 363359442,
"filtered": 100,
"index_condition": "locking_scripts.script_type_id not in (2,3,4,5)"
}
}
}
解决方案
(评论太大了。)
到目前为止,我很难过。JSON 是不同的,但它们并没有告诉我足够的信息。这是另一件要试验的事情:
FLUSH STATUS;
SELECT ...;
SHOW SESSION STATUS LIKE 'Handler%';
这是获取某些操作的实际计数的通用技术。它可能有助于确定“1”行是在扫描中出现的早还是晚。
推荐阅读
- reactjs - React table 使表格标题列与表格行列不对齐
- javascript - 真实检查后访问可选道具的流程错误
- python - 将列表拆分一半长度并添加具有相关值的新列
- css - 当验证 DIV 可见时,Bootstrap Grid 会发生变化
- google-cloud-composer - Cloud Composer 是否具有故障转移功能?
- go - 可以将函数作为 for 循环条件的一部分调用吗?
- algorithm - c++ std::sort 意外行为(运行时错误)
- c# - 在桌面应用程序(winforms 或 wpf)中尽快执行代码(在任何第三方代码之前)?
- python - 访问嵌入列表的相同索引
- c# - 在手机上打开 Excel 文件 (.xls) 时遇到问题