mysql - 可以使这个子查询使用索引吗?
问题描述
首先,提前为文字墙道歉。我确实通读了我能找到的每一个类似的问题/答案,但是答案似乎不适用于我的查询,或者我需要更清楚地理解潜在的问题和解决方案。
我有一个文件大小表以及相关的文件日期和观察时间戳。所有日期都是以秒为单位的 UNIX 纪元时间整数:
mysql> describe name_servers;
+-----------------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------------+------------------+------+-----+---------+----------------+
| server_name | varchar(255) | YES | | NULL | |
| file_date | int(10) unsigned | YES | | NULL | |
| file_size | int(10) unsigned | YES | | NULL | |
| time | int(10) unsigned | YES | MUL | NULL | |
| poll_id | int(11) | NO | PRI | NULL | auto_increment |
+-----------------------+------------------+------+-----+---------+----------------+
5 rows in set (0.01 sec)
mysql> show index from name_servers;
+--------------+------------+--------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------------+------------+--------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| name_servers | 0 | PRIMARY | 1 | poll_id | A | 3523218 | NULL | NULL | | BTREE | | |
| name_servers | 0 | index_time_servername | 1 | time | A | 503316 | NULL | NULL | YES | BTREE | | |
| name_servers | 0 | index_time_servername | 2 | server_name | A | 3523218 | NULL | NULL | YES | BTREE | | |
+--------------+------------+--------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
3 rows in set (0.00 sec)
我必须跟踪文件大小的变化,以检测文件是否在任何 48 小时内缩小了 > 20%。通常我会尝试使用 MySQL Window 函数来执行此操作,但我的服务器上的 MySQL 版本不支持它们(5.6.37 - 我无法控制,因为服务器不是由我的团队管理的)。目前,我通过在当前行中找到文件大小的外部查询和在前 48 小时(172,800 秒)中找到最大文件大小的内部子查询来获取当前大小和最大大小(过去 48 小时) ) 行数:
mysql> select name_servers_outside.server_name,
-> name_servers_outside.file_size,
-> name_servers_outside.file_date,
-> name_servers_outside.time,
-> (select max(file_size) from name_servers where time > (name_servers_outside.time - 172800) and server_name = 'example_server') as max_file_size
-> from name_servers as name_servers_outside
-> where name_servers_outside.server_name = 'example_server'
-> and name_servers_outside.time > (UNIX_TIMESTAMP() - 172800)
-> limit 10;
+-------------------+-------------------+-------------------+------------+-----------------------+
| server_name | file_size | file_date | time | max_file_size |
+-------------------+-------------------+-------------------+------------+-----------------------+
| example_server | 1159544 | 1550382945 | 1550382985 | 1159580 |
| example_server | 1159544 | 1550382945 | 1550383195 | 1159580 |
| example_server | 1159544 | 1550382945 | 1550383255 | 1159580 |
| example_server | 1159544 | 1550382945 | 1550383316 | 1159580 |
| example_server | 1159544 | 1550382945 | 1550383376 | 1159580 |
| example_server | 1159544 | 1550382945 | 1550383435 | 1159580 |
| example_server | 1159544 | 1550382945 | 1550383496 | 1159580 |
| example_server | 1159544 | 1550382945 | 1550383555 | 1159580 |
| example_server | 1159544 | 1550382945 | 1550383616 | 1159580 |
| example_server | 1159544 | 1550382945 | 1550383676 | 1159580 |
+-------------------+-------------------+-------------------+------------+-----------------------+
10 rows in set (16.11 sec)
仅检索这 10 行就需要 16 秒,而在生产中,此查询必须检索超过 150 行。内部查询正在对所有 300 万多个表行进行完整扫描,并显示消息“已检查每个记录的范围(索引映射:0x2)”:
mysql> explain
-> select name_servers_outside.server_name,
-> name_servers_outside.file_size,
-> name_servers_outside.file_date,
-> name_servers_outside.time,
-> (select max(file_size) from name_servers where time > (name_servers_outside.time - 172800) and server_name = 'example_server') as max_file_size
-> from name_servers as name_servers_outside
-> where name_servers_outside.server_name = 'example_server'
-> and name_servers_outside.time > (UNIX_TIMESTAMP() - 172800);
+----+--------------------+----------------------+-------+--------------------------+--------------------------+---------+------+---------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+----------------------+-------+--------------------------+--------------------------+---------+------+---------+------------------------------------------------+
| 1 | PRIMARY | name_servers_outside | range | index_time_servername | index_time_servername | 5 | NULL | 47302 | Using index condition; Using MRR |
| 2 | DEPENDENT SUBQUERY | name_servers | ALL | index_time_servername | NULL | NULL | NULL | 3533883 | Range checked for each record (index map: 0x2) |
+----+--------------------+----------------------+-------+--------------------------+--------------------------+---------+------+---------+------------------------------------------------+
2 rows in set (0.01 sec)
有问题的部分似乎是这样的:
time > (name_servers_outside.time - 172800)
如果我使用静态整数值而不是子查询中的“name_servers_outside.time”列引用运行类似的查询,则索引将按预期使用并且查询速度很快:
time > (UNIX_TIMESTAMP() - 172800)
修改后的查询:
mysql> select name_servers_outside.server_name,
-> name_servers_outside.file_size,
-> name_servers_outside.file_date,
-> name_servers_outside.time,
-> (select max(file_size) from name_servers where time > (UNIX_TIMESTAMP() - 172800) and server_name = 'example_server') as max_file_size
-> from name_servers as name_servers_outside
-> where name_servers_outside.server_name = 'example_server'
-> and name_servers_outside.time > (UNIX_TIMESTAMP() - 172800)
-> limit 10;
+--------------------+-------------------+-------------------+------------+-----------------------+
| server_name | file_size | file_date | time | max_file_size |
+--------------------+-------------------+-------------------+------------+-----------------------+
| example_server | 1159544 | 1550382945 | 1550382985 | 1159580 |
| example_server | 1159544 | 1550382945 | 1550383195 | 1159580 |
| example_server | 1159544 | 1550382945 | 1550383255 | 1159580 |
| example_server | 1159544 | 1550382945 | 1550383316 | 1159580 |
| example_server | 1159544 | 1550382945 | 1550383376 | 1159580 |
| example_server | 1159544 | 1550382945 | 1550383435 | 1159580 |
| example_server | 1159544 | 1550382945 | 1550383496 | 1159580 |
| example_server | 1159544 | 1550382945 | 1550383555 | 1159580 |
| example_server | 1159544 | 1550382945 | 1550383616 | 1159580 |
| example_server | 1159544 | 1550382945 | 1550383676 | 1159580 |
+--------------------+-------------------+-------------------+------------+-----------------------+
10 rows in set (0.01 sec)
mysql> explain
-> select name_servers_outside.server_name,
-> name_servers_outside.file_size,
-> name_servers_outside.file_date,
-> name_servers_outside.time,
-> (select max(file_size) from name_servers where time > (UNIX_TIMESTAMP() - 172800) and server_name = 'example_server') as max_file_size
-> from name_servers as name_servers_outside
-> where name_servers_outside.server_name = 'example_server'
-> and name_servers_outside.time > (UNIX_TIMESTAMP() - 172800)
-> limit 10;
+----+-------------+----------------------+-------+--------------------------+--------------------------+---------+------+-------+----------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------------+-------+--------------------------+--------------------------+---------+------+-------+----------------------------------+
| 1 | PRIMARY | name_servers_outside | range | index_time_servername | index_time_servername | 5 | NULL | 49042 | Using index condition; Using MRR |
| 2 | SUBQUERY | name_servers | range | index_time_servername | index_time_servername | 5 | NULL | 49042 | Using index condition; Using MRR |
+----+-------------+----------------------+-------+--------------------------+--------------------------+---------+------+-------+----------------------------------+
2 rows in set (0.00 sec)
谢谢你和我一起读到这里。我再次为巨大的文字墙道歉,但我想确保包含足够的解释性细节以清楚地定义问题。
现在,我要解决的问题是我需要在每行之前的 48 小时内检索 file_size 的最大值。因此,每一行都有自己独特的“max(file_size)”计算时间范围。这将用于计算文件大小变化的百分比。如上所述,我通常想为此使用窗口函数,但我的 MySQL (5.6.37) 版本不支持它们,并且由于我不拥有该服务器,因此我无法更新到 8.0。
与往常一样,我们将不胜感激任何建议。感谢您的阅读!
解决方案
I would first try adding file_size to your index_time_servername index but I suspect the real issue is that you're having to use name_servers_outside.time inside your subquery which being from a different alias is probably confusing the query planner.
So, how about losing the subquery and join the table to itself where time is between time and time-48-hours-ago?
Something like...
SELECT
name_servers_outside.server_name,
name_servers_outside.file_size,
name_servers_outside.file_date,
name_servers_outside.time,
MAX(previous.file_size) AS max_file_size
FROM
name_servers AS ns
INNER JOIN name_servers AS previous
ON previous.time BETWEEN (ns.time - 172800) AND (ns.time - 1)
WHERE
ns.server_name = 'example_server'
AND ns.time > (UNIX_TIMESTAMP() - 172800)
GROUP BY
ns.server_name,
ns.file_size,
ns.file_date,
ns.time
LIMIT 10;
推荐阅读
- javascript - 未捕获的类型错误:无法在 addToCart (cart.js:60) 读取 null 的属性“产品”
- c# - 在一对多关系.net实体框架中的一个视图中显示数据
- javascript - Flexbox在强制换行时全宽?
- angular - 通过域名访问时不加载整个 Angular 2 应用程序
- lsf - 强制 LSF 在不同的主机上执行作业
- java - 如何使用域直接访问部署在 aws Tomcat 中的 Web 应用程序而不显示端口?
- amazon-web-services - 为新上传的文件触发 Java 中的 AWS Lambda
- c# - 在不同网络上的计算机之间发送数据 C#
- python - 合并几个不同大小的熊猫数据框
- here-api - 车队远程信息处理自定义路线