mysql - MariaDB 逻辑更改具有非常大的 IN (...) 集
问题描述
我有一个特定的 MariaDB 查询,它利用了一些连接和一些 IN ('...') 条件。通常,它在大型数据集(约 50M 记录)上在 < 2 秒内返回结果,但是当 IN 条件中出现大量选项时(例如 1000+ IN 选项),查询需要 5+ 小时并且逻辑完全对查询运行 ANALYZE 时会发生变化。
希望了解为什么会这样,并就如何解决瓶颈提出建议。目前认为最简单的选择可能是完全删除 IN 条件并在 PHP 而不是 SQL 中过滤结果,就好像 IN 条件被删除一样,结果在同一张表上以 <1s 返回。
分析使用小 IN 集的查询的结果。
+------+-------------+-------+-------+------------------------+----------+---------+--------------------+-------+---------+----------+------------+---------------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra |
+------+-------------+-------+-------+------------------------+----------+---------+--------------------+-------+---------+----------+------------+---------------------------------------------------------------------+
| 1 | SIMPLE | t1 | range | calldate,call_id | calldate | 7 | NULL | 13400 | 7162.00 | 100.00 | 33.01 | Using index condition; Using where; Using temporary; Using filesort |
| 1 | SIMPLE | t2 | ref | PRIMARY,called,call_id | call_id | 4 | crimson.t1.call_id | 1 | 3.21 | 100.00 | 35.24 | Using where |
| 1 | SIMPLE | d1 | ref | digits,leg_id | leg_id | 4 | crimson.t2.xid | 1 | 1.94 | 100.00 | 0.58 | Using index condition; Using where |
| 1 | SIMPLE | g1 | ref | call_id | call_id | 4 | crimson.t1.call_id | 1 | 3.00 | 100.00 | 0.00 | Using where |
+------+-------------+-------+-------+------------------------+----------+---------+--------------------+-------+---------+----------+------------+---------------------------------------------------------------------+
4 rows in set (1.154 sec)
ANALYZE 来自使用了 1200 IN 集的相同查询和条件的结果。
+------+--------------+-------------+--------+------------------------+---------+---------+--------------------+------+---------+----------+------------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra |
+------+--------------+-------------+--------+------------------------+---------+---------+--------------------+------+---------+----------+------------+---------------------------------+
| 1 | PRIMARY | <subquery2> | ALL | distinct_key | NULL | NULL | NULL | 1222 | 1222.00 | 100.00 | 100.00 | Using temporary; Using filesort |
| 1 | PRIMARY | d1 | ref | digits,leg_id | digits | 29 | tvc_0._col_1 | 5 | 6192.72 | 100.00 | 100.00 | Using index condition |
| 1 | PRIMARY | t2 | eq_ref | PRIMARY,called,call_id | PRIMARY | 8 | crimson.d1.leg_id | 1 | 1.00 | 100.00 | 36.73 | Using where |
| 1 | PRIMARY | g1 | ref | call_id | call_id | 4 | crimson.t2.call_id | 1 | 3.32 | 100.00 | 0.05 | Using where |
| 1 | PRIMARY | t1 | ref | calldate,call_id | call_id | 4 | crimson.t2.call_id | 1 | 5.19 | 100.00 | 0.03 | Using where |
| 2 | MATERIALIZED | <derived3> | ALL | NULL | NULL | NULL | NULL | 1222 | 1222.00 | 100.00 | 100.00 | |
| 3 | DERIVED | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |
+------+--------------+-------------+--------+------------------------+---------+---------+--------------------+------+---------+----------+------------+---------------------------------+
7 rows in set (5 hours 16 min 16.738 sec)
ANALYZE 完全没有任何 IN 条件。
+------+-------------+-------+-------+------------------------+----------+---------+--------------------+-------+---------+----------+------------+---------------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra |
+------+-------------+-------+-------+------------------------+----------+---------+--------------------+-------+---------+----------+------------+---------------------------------------------------------------------+
| 1 | SIMPLE | t1 | range | calldate,call_id | calldate | 7 | NULL | 13400 | 7162.00 | 100.00 | 33.01 | Using index condition; Using where; Using temporary; Using filesort |
| 1 | SIMPLE | t2 | ref | PRIMARY,called,call_id | call_id | 4 | crimson.t1.call_id | 1 | 3.21 | 100.00 | 35.24 | Using where |
| 1 | SIMPLE | g1 | ref | call_id | call_id | 4 | crimson.t1.call_id | 1 | 3.57 | 100.00 | 0.06 | Using where |
| 1 | SIMPLE | d1 | ref | leg_id | leg_id | 4 | crimson.t2.xid | 1 | 1.33 | 100.00 | 100.00 | Using index condition |
+------+-------------+-------+-------+------------------------+----------+---------+--------------------+-------+---------+----------+------------+---------------------------------------------------------------------+
4 rows in set (0.093 sec)
示例表:
CREATE TABLE `digit_dial_map_x` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`leg_id` int(11) NOT NULL,
`sequence` int(2) NOT NULL,
`digits` varchar(26) DEFAULT NULL,
`category` varchar(2) NOT NULL DEFAULT 'M',
INDEX `leg_id` (`leg_id`),
INDEX `digits` (`digits`),
INDEX `category` (`category`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE `call_legs_x` (
`xid` bigint(20) NOT NULL AUTO_INCREMENT,
`call_id` int(11) NOT NULL,
`calldate` date NOT NULL,
`start_time` time DEFAULT NULL,
`duration_hr` varchar(6) DEFAULT NULL,
`duration_min` varchar(2) DEFAULT NULL,
`duration_sec` varchar(2) DEFAULT NULL,
`calling` varchar(7) DEFAULT NULL,
`called` varchar(7) DEFAULT NULL,
`ans` varchar(2) DEFAULT NULL,
`ans_time` varchar(4) DEFAULT NULL,
`digits_dialed` varchar(26) DEFAULT NULL,
`digits_actual` varchar(26) DEFAULT NULL,
`ani` varchar(20) DEFAULT NULL,
`dnis` varchar(10) DEFAULT NULL,
`extn` varchar(10) DEFAULT NULL,
`trans_conf` varchar(2) DEFAULT NULL,
`third_party` varchar(7) DEFAULT NULL,
`sysid` varchar(3) DEFAULT NULL,
`call_log_id` varchar(12) DEFAULT NULL,
`assoc_log_id` varchar(12) DEFAULT NULL,
`raw_id` int(11) NOT NULL,
`leg` varchar(2) DEFAULT NULL,
`call_start` datetime DEFAULT NULL,
`call_end` datetime DEFAULT NULL,
`call_start_utc` datetime DEFAULT NULL,
`call_end_utc` datetime DEFAULT NULL,
INDEX `calldate` (`calldate`, `start_time`),
INDEX `called` (`called`),
INDEX `call_id` (`call_id`),
INDEX `digits_dialed` (`digits_dialed`),
INDEX `raw_id` (`raw_id`),
INDEX `call_start` (`call_start`),
INDEX `call_end` (`call_end`),
INDEX `call_start_utc` (`call_start`),
INDEX `call_end_utc` (`call_end`),
INDEX `calling` (`calling`),
INDEX `ans_time` (`ans_time`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
示例查询(省略 1200 IN 选项):
SELECT t1.call_id, t2.ans_time, t2.ans, ((t2.duration_hr * 3600) + (t2.duration_min *60) + t2.duration_sec) as duration, t2.digits_dialed, t2.digits_actual, t2.dnis, t2.trans_conf, t1.ani, t1.calling, t2.called, d1.digits, g1.extn
FROM call_legs_55 as t1
JOIN call_legs_55 as t2 ON t1.call_id=t2.call_id
JOIN digit_dial_map_55 as d1 ON t2.xid=d1.leg_id
JOIN call_legs_55 as g1 ON t1.call_id=g1.call_id
WHERE (t1.calldate BETWEEN '2019-11-25' AND '2019-11-25') AND NOT ((t1.calldate = '2019-11-25') AND (t1.start_time < '00:00:00')) AND NOT((t1.calldate = '2019-11-25') AND (t1.start_time > '24:00:00'))
AND (t1.calling IN ('T6001','T6002') )
AND d1.digits IN ('...')
AND t2.called !='X9999'
AND t1.calling != 'X9999'
AND t1.calling != ''
AND t2.ans_time != ''
AND (g1.extn IN ('52043','52042','52132','52116') AND g1.extn != t1.calling)
GROUP BY CONCAT(t1.call_id, g1.extn);
解决方案
digit_dial_map_x 是一个多:多映射表,对吗?它的索引非常低效。
推荐:
PRIMARY KEY(leg_id, digits),
INDEX(digits, leg_id)
更多讨论:http: //mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table
(这张桌子碰巧不是这样的。)
推荐阅读
- java - 命中一个端点,该端点在表中有可处理的数据
- javascript - React - 如何获取我点击的按钮元素?
- cypress - 让赛普拉斯听随机的 DOM 变化?
- dialogflow-es - 如何覆盖 Dialogflow 中的取消字词?
- typescript - 如何在 Typescript 中以不同的名称重新导出枚举?
- reactjs - react-native 数组映射不迭代
- vue.js - 无法使用 npm 链接访问 Vue 指令
- c# - 为什么我的命名空间引用在 Visual Studio 中不断中断?
- excel - 计算唯一值并在另一列中返回结果
- inno-setup - 根据 Innosetup 中的操作系统位数有条件地将任务显示为复选框而不是单选