mysql - MySql 分组优化 - 避免 tmp 表和/或文件排序
问题描述
我的查询速度很慢,没有 group by 很快(0.1-0.3 秒),但是(必需的)group by 持续时间约为 10-15 秒。
该查询连接两个表,events(近 5000 万行)和 events_locations(500 万行)。
询问:
SELECT `e`.`id` AS `event_id`,`e`.`time_stamp` AS `time_stamp`,`el`.`latitude` AS `latitude`,`el`.`longitude` AS `longitude`,
`el`.`time_span` AS `extra`,`e`.`entity_id` AS `asset_name`, `el`.`other_id` AS `geozone_id`,
`el`.`group_alias` AS `group_alias`,`e`.`event_type_id` AS `event_type_id`,
`e`.`entity_type_id`AS `entity_type_id`, el.some_id
FROM events e
INNER JOIN events_locations el ON el.event_id = e.id
WHERE 1=1
AND el.other_id = '1'
AND time_stamp >= '2018-01-01'
AND time_stamp <= '2019-06-02'
GROUP BY `e`.`event_type_id` , `el`.`some_id` , `el`.`group_alias`;
表事件:
CREATE TABLE `events` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`event_type_id` int(11) NOT NULL,
`entity_type_id` int(11) NOT NULL,
`entity_id` varchar(64) NOT NULL,
`alias` varchar(64) NOT NULL,
`time_stamp` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `entity_id` (`entity_id`),
KEY `event_type_idx` (`event_type_id`),
KEY `idx_events_time_stamp` (`time_stamp`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
表 events_locations
CREATE TABLE `events_locations` (
`event_id` bigint(20) NOT NULL,
`latitude` double NOT NULL,
`longitude` double NOT NULL,
`some_id` bigint(20) DEFAULT NULL,
`other_id` bigint(20) DEFAULT NULL,
`time_span` bigint(20) DEFAULT NULL,
`group_alias` varchar(64) NOT NULL,
KEY `some_id_idx` (`some_id`),
KEY `idx_events_group_alias` (`group_alias`),
KEY `idx_event_id` (`event_id`),
CONSTRAINT `fk_event_id` FOREIGN KEY (`event_id`) REFERENCES `events` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
解释:
+----+-------------+-------+--------+---------------------------------+---------+---------+-------------------------------------------+----------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------------------------+---------+---------+-------------------------------------------+----------+------------------------------------------------+
| 1 | SIMPLE | ea | ALL | 'idx_event_id' | NULL | NULL | NULL | 5152834 | 'Using where; Using temporary; Using filesort' |
| 1 | SIMPLE | e | eq_ref | 'PRIMARY,idx_events_time_stamp' | PRIMARY | '8' | 'name.ea.event_id' | 1 | |
+----+-------------+----------------+---------------------------------+---------+---------+-------------------------------------------+----------+------------------------------------------------+
2 rows in set (0.08 sec)
从文档:
可以在以下条件下创建临时表:
如果存在 ORDER BY 子句和不同的 GROUP BY 子句,或者如果 ORDER BY 或 GROUP BY 包含来自连接队列中第一个表以外的表的列,则会创建一个临时表。
DISTINCT 与 ORDER BY 结合可能需要一个临时表。
如果使用 SQL_SMALL_RESULT 选项,MySQL 将使用内存临时表,除非查询还包含需要磁盘存储的元素(稍后描述)。
我已经尝试过:
- 通过 ' 创建索引
el
。some_id
,el
.group_alias
' - 将 varchar 大小减小到 20
- 增加sort_buffer_size和read_rnd_buffer_size的大小;
任何有关性能调整的建议将不胜感激!
解决方案
在您的情况下,events
表具有time_span
索引属性。因此,在加入两个表之前,首先从events
表中选择特定日期范围所需的记录以及所需的详细信息。然后event_location
使用表关系属性加入。
检查您的 MySqlExplain
关键字以检查您如何处理表记录。它会告诉您在选择所需记录之前扫描了多少行。
扫描的行数也涉及查询执行时间。使用我下面的逻辑来减少扫描的行数。
SELECT
`e`.`id` AS `event_id`,
`e`.`time_stamp` AS `time_stamp`,
`el`.`latitude` AS `latitude`,
`el`.`longitude` AS `longitude`,
`el`.`time_span` AS `extra`,
`e`.`entity_id` AS `asset_name`,
`el`.`other_id` AS `geozone_id`,
`el`.`group_alias` AS `group_alias`,
`e`.`event_type_id` AS `event_type_id`,
`e`.`entity_type_id` AS `entity_type_id`,
`el`.`some_id` as `some_id`
FROM
(select
`id` AS `event_id`,
`time_stamp` AS `time_stamp`,
`entity_id` AS `asset_name`,
`event_type_id` AS `event_type_id`,
`entity_type_id` AS `entity_type_id`
from
`events`
WHERE
time_stamp >= '2018-01-01'
AND time_stamp <= '2019-06-02'
) AS `e`
JOIN `events_locations` `el` ON `e`.`event_id` = `el`.`event_id`
WHERE
`el`.`other_id` = '1'
GROUP BY
`e`.`event_type_id` ,
`el`.`some_id` ,
`el`.`group_alias`;
推荐阅读
- javascript - 为钩子设置 React / Redux 打字稿
- node.js - Bull-arena 要求将队列构造函数提供给 Arena
- javascript - 网页抓取交互式图表
- javascript - jQuery:从输入数组中获取所有值并将它们计算为变量
- swift - 使用覆盖将 SwipeCellKit 连接到自定义单元格 - Swift
- python - Python CV2 颜色检测混淆
- algorithm - UTXO选择策略
- reactjs - 使用从父组件中进行的异步 api 调用接收的道具更新反应子组件的状态
- php - 未排序的数组 - 从下一个更高的值获取索引 | 复杂度 O(n), PHP
- javascript - 如何只允许粘贴输入中的 URL?