首页 > 解决方案 > MariaDB 逻辑更改具有非常大的 IN (...) 集

问题描述

我有一个特定的 MariaDB 查询,它利用了一些连接和一些 IN ('...') 条件。通常,它在大型数据集(约 50M 记录)上在 < 2 秒内返回结果,但是当 IN 条件中出现大量选项时(例如 1000+ IN 选项),查询需要 5+ 小时并且逻辑完全对查询运行 ANALYZE 时会发生变化。

希望了解为什么会这样,并就如何解决瓶颈提出建议。目前认为最简单的选择可能是完全删除 IN 条件并在 PHP 而不是 SQL 中过滤结果,就好像 IN 条件被删除一样,结果在同一张表上以 <1s 返回。

分析使用小 IN 集的查询的结果。

+------+-------------+-------+-------+------------------------+----------+---------+--------------------+-------+---------+----------+------------+---------------------------------------------------------------------+
| id   | select_type | table | type  | possible_keys          | key      | key_len | ref                | rows  | r_rows  | filtered | r_filtered | Extra                                                               |
+------+-------------+-------+-------+------------------------+----------+---------+--------------------+-------+---------+----------+------------+---------------------------------------------------------------------+
|    1 | SIMPLE      | t1    | range | calldate,call_id       | calldate | 7       | NULL               | 13400 | 7162.00 |   100.00 |      33.01 | Using index condition; Using where; Using temporary; Using filesort |
|    1 | SIMPLE      | t2    | ref   | PRIMARY,called,call_id | call_id  | 4       | crimson.t1.call_id |     1 |    3.21 |   100.00 |      35.24 | Using where                                                         |
|    1 | SIMPLE      | d1    | ref   | digits,leg_id          | leg_id   | 4       | crimson.t2.xid     |     1 |    1.94 |   100.00 |       0.58 | Using index condition; Using where                                  |
|    1 | SIMPLE      | g1    | ref   | call_id                | call_id  | 4       | crimson.t1.call_id |     1 |    3.00 |   100.00 |       0.00 | Using where                                                         |
+------+-------------+-------+-------+------------------------+----------+---------+--------------------+-------+---------+----------+------------+---------------------------------------------------------------------+
4 rows in set (1.154 sec)

ANALYZE 来自使用了 1200 IN 集的相同查询和条件的结果。

+------+--------------+-------------+--------+------------------------+---------+---------+--------------------+------+---------+----------+------------+---------------------------------+
| id   | select_type  | table       | type   | possible_keys          | key     | key_len | ref                | rows | r_rows  | filtered | r_filtered | Extra                           |
+------+--------------+-------------+--------+------------------------+---------+---------+--------------------+------+---------+----------+------------+---------------------------------+
|    1 | PRIMARY      | <subquery2> | ALL    | distinct_key           | NULL    | NULL    | NULL               | 1222 | 1222.00 |   100.00 |     100.00 | Using temporary; Using filesort |
|    1 | PRIMARY      | d1          | ref    | digits,leg_id          | digits  | 29      | tvc_0._col_1       |    5 | 6192.72 |   100.00 |     100.00 | Using index condition           |
|    1 | PRIMARY      | t2          | eq_ref | PRIMARY,called,call_id | PRIMARY | 8       | crimson.d1.leg_id  |    1 |    1.00 |   100.00 |      36.73 | Using where                     |
|    1 | PRIMARY      | g1          | ref    | call_id                | call_id | 4       | crimson.t2.call_id |    1 |    3.32 |   100.00 |       0.05 | Using where                     |
|    1 | PRIMARY      | t1          | ref    | calldate,call_id       | call_id | 4       | crimson.t2.call_id |    1 |    5.19 |   100.00 |       0.03 | Using where                     |
|    2 | MATERIALIZED | <derived3>  | ALL    | NULL                   | NULL    | NULL    | NULL               | 1222 | 1222.00 |   100.00 |     100.00 |                                 |
|    3 | DERIVED      | NULL        | NULL   | NULL                   | NULL    | NULL    | NULL               | NULL |    NULL |     NULL |       NULL | No tables used                  |
+------+--------------+-------------+--------+------------------------+---------+---------+--------------------+------+---------+----------+------------+---------------------------------+
7 rows in set (5 hours 16 min 16.738 sec)

ANALYZE 完全没有任何 IN 条件。

+------+-------------+-------+-------+------------------------+----------+---------+--------------------+-------+---------+----------+------------+---------------------------------------------------------------------+
| id   | select_type | table | type  | possible_keys          | key      | key_len | ref                | rows  | r_rows  | filtered | r_filtered | Extra                                                               |
+------+-------------+-------+-------+------------------------+----------+---------+--------------------+-------+---------+----------+------------+---------------------------------------------------------------------+
|    1 | SIMPLE      | t1    | range | calldate,call_id       | calldate | 7       | NULL               | 13400 | 7162.00 |   100.00 |      33.01 | Using index condition; Using where; Using temporary; Using filesort |
|    1 | SIMPLE      | t2    | ref   | PRIMARY,called,call_id | call_id  | 4       | crimson.t1.call_id |     1 |    3.21 |   100.00 |      35.24 | Using where                                                         |
|    1 | SIMPLE      | g1    | ref   | call_id                | call_id  | 4       | crimson.t1.call_id |     1 |    3.57 |   100.00 |       0.06 | Using where                                                         |
|    1 | SIMPLE      | d1    | ref   | leg_id                 | leg_id   | 4       | crimson.t2.xid     |     1 |    1.33 |   100.00 |     100.00 | Using index condition                                               |
+------+-------------+-------+-------+------------------------+----------+---------+--------------------+-------+---------+----------+------------+---------------------------------------------------------------------+
4 rows in set (0.093 sec)

示例表:

CREATE TABLE `digit_dial_map_x` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `leg_id` int(11) NOT NULL,
  `sequence` int(2) NOT NULL,
  `digits` varchar(26) DEFAULT NULL,
  `category` varchar(2) NOT NULL DEFAULT 'M',  
  INDEX `leg_id` (`leg_id`),
  INDEX `digits` (`digits`),
  INDEX `category` (`category`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

CREATE TABLE `call_legs_x` (
  `xid` bigint(20) NOT NULL AUTO_INCREMENT,
  `call_id` int(11) NOT NULL,
  `calldate` date NOT NULL,
  `start_time` time DEFAULT NULL,
  `duration_hr` varchar(6) DEFAULT NULL,
  `duration_min` varchar(2) DEFAULT NULL,
  `duration_sec` varchar(2) DEFAULT NULL,
  `calling` varchar(7) DEFAULT NULL,
  `called` varchar(7) DEFAULT NULL,
  `ans` varchar(2) DEFAULT NULL,
  `ans_time` varchar(4) DEFAULT NULL,
  `digits_dialed` varchar(26) DEFAULT NULL,
  `digits_actual` varchar(26) DEFAULT NULL,
  `ani` varchar(20) DEFAULT NULL,
  `dnis` varchar(10) DEFAULT NULL,
  `extn` varchar(10) DEFAULT NULL,
  `trans_conf` varchar(2) DEFAULT NULL,
  `third_party` varchar(7) DEFAULT NULL,
  `sysid` varchar(3) DEFAULT NULL,
  `call_log_id` varchar(12) DEFAULT NULL,
  `assoc_log_id` varchar(12) DEFAULT NULL,
  `raw_id` int(11) NOT NULL,
  `leg` varchar(2) DEFAULT NULL,
  `call_start` datetime DEFAULT NULL,
  `call_end` datetime DEFAULT NULL,
  `call_start_utc` datetime DEFAULT NULL,
  `call_end_utc` datetime DEFAULT NULL,
  INDEX `calldate` (`calldate`, `start_time`),
  INDEX `called` (`called`),
  INDEX `call_id` (`call_id`),
  INDEX `digits_dialed` (`digits_dialed`),
  INDEX `raw_id` (`raw_id`),
  INDEX `call_start` (`call_start`),
  INDEX `call_end` (`call_end`),
  INDEX `call_start_utc` (`call_start`),
  INDEX `call_end_utc` (`call_end`),
  INDEX `calling` (`calling`),
  INDEX `ans_time` (`ans_time`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

示例查询(省略 1200 IN 选项):

SELECT t1.call_id, t2.ans_time, t2.ans, ((t2.duration_hr * 3600) + (t2.duration_min *60) + t2.duration_sec) as duration, t2.digits_dialed, t2.digits_actual, t2.dnis, t2.trans_conf, t1.ani, t1.calling, t2.called, d1.digits, g1.extn
  FROM call_legs_55 as t1
  JOIN call_legs_55 as t2 ON t1.call_id=t2.call_id
  JOIN digit_dial_map_55 as d1 ON t2.xid=d1.leg_id
  JOIN call_legs_55 as g1 ON t1.call_id=g1.call_id
  WHERE (t1.calldate BETWEEN '2019-11-25' AND '2019-11-25') AND NOT ((t1.calldate = '2019-11-25') AND (t1.start_time < '00:00:00')) AND NOT((t1.calldate = '2019-11-25') AND (t1.start_time > '24:00:00'))
  AND (t1.calling IN ('T6001','T6002') )
  AND d1.digits IN ('...')
  AND t2.called !='X9999'
  AND t1.calling != 'X9999'
  AND t1.calling != ''
  AND t2.ans_time != ''
  AND (g1.extn IN ('52043','52042','52132','52116') AND g1.extn != t1.calling)
  GROUP BY CONCAT(t1.call_id, g1.extn);

标签: mysqlmariadb

解决方案


digit_dial_map_x 是一个多:多映射表,对吗?它的索引非常低效。

推荐:

PRIMARY KEY(leg_id, digits),
INDEX(digits, leg_id)

更多讨论:http: //mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table

(这张桌子碰巧不是这样的。)


推荐阅读