mysql - 如何优化使用依赖子查询的查询
问题描述
我正在使用此 NOT IN 查询从单个表中返回非活动用户。
SELECT *
FROM
( SELECT DISTINCT name
FROM userlog
WHERE created >= '2019-07-07 00:00:00' - INTERVAL 30 DAY
AND created <= '2019-07-13 23:59:59' - INTERVAL 30 DAY
AND isSample = 0
) inactive
WHERE inactive.name NOT IN
(
SELECT name AS name
FROM userlog
WHERE created >= '2019-07-13 23:59:59' - INTERVAL 30 DAY
AND created <= '2019-07-13 23:59:59' AND isSample = 0
)
此查询的描述:
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ID | select_type | table | partitions | type | possiblekeys | Keys | key_len | ref | rows | filtered | extra |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1 | primary | <derived2>| (null)OK | ALL | NULL | null | NULL | NULL| 50000 | 100.00 | using where |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 3 | subquery | userlog | (null)OK | range| *list of indexes | nameindex | 774 | NULL| 1000000| 10.00 | using index condition |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 2 | derived | userlog | (null)OK | range| *list of indexes | nameindex | 774 | NULL| 500000 | 10.00 | using index condition; using temporary |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+
我不想根据名称查询,因为名称可能会改变,但他们的 ID 永远不会改变,所以我使用 ID 进行查询。我使用相同的查询,我只是更改字段
SELECT *
FROM
(SELECT DISTINCT(id) AS id
FROM userlog
WHERE created >= '2019-07-07 00:00:00' - INTERVAL 30 DAY
AND created <= '2019-07-13 23:59:59' - INTERVAL 30 DAY
AND isSample = '0'
) inactive
WHERE inactive.id NOT IN
(SELECT id AS id
FROM userlog
WHERE created >= '2019-07-13 23:59:59' - INTERVAL 30 DAY
AND created <= '2019-07-13 23:59:59'
AND isSample = '0')
现在这个查询的描述与上面不同:
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ID | select_type | table | partitions | type | possiblekeys | Keys | key_len | ref | rows | filtered | extra |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1 | primary | <derived2>| (null)OK | ALL | NULL | null | NULL | NULL| 50000 | 100.00 | using where |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 3 |dependent subquery| userlog | (null)OK |index_subquery| *list of indexes | countindex | 768 | func| 892 | 0.61 | using where; full scan on null key |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 2 | derived | userlog | (null)OK | range | *list of indexes | idindex | 774 | NULL| 500000 | 10.00 | using index condition; using temporary |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
该查询现在使用一个依赖子查询,它正在执行全表扫描,这在我的表上非常慢(20+ 百万条记录)。我注意到 ID 查询没有使用 idindex,但它使用了我的计数索引。如果我将每个查询单独分开,它们都使用 ID 索引,但是当它们组合为 NOT IN 时,使用计数索引。
这是我的索引:
+--------------------------------------------------------------------------------------------------------------------------------+
| TABLE | NON_UNIQUE | KEY NAME | SEQ_IN_INDEX | COLUMN_NAME | COLLATION | CARDINALITY | SUB_PART | PACKED | NULL | INDEX_TYPE |
+--------------------------------------------------------------------------------------------------------------------------------+
| userlog | 1 |countindex| 1 | id | A | 75000 | 255 | NULL | YES | BTREE |
+--------------------------------------------------------------------------------------------------------------------------------+
| userlog | 1 |countindex| 2 | pk | A | 11500000 | null | NULL | YES | BTREE |
+--------------------------------------------------------------------------------------------------------------------------------+
| userlog | 1 |nameindex | 1 | created | A | 6800000 | null | NULL | YES | BTREE |
+--------------------------------------------------------------------------------------------------------------------------------+
| userlog | 1 |nameindex | 2 | sample | A | 13500000 | null | NULL | YES | BTREE |
+--------------------------------------------------------------------------------------------------------------------------------+
| userlog | 1 |nameindex | 3 | name | A | 24000000 | null | NULL | YES | BTREE |
+--------------------------------------------------------------------------------------------------------------------------------+
| userlog | 1 | idindex | 1 | id | A | 75000 | 512 | NULL | YES | BTREE |
+--------------------------------------------------------------------------------------------------------------------------------+
| userlog | 1 | idindex | 2 | created | A | 22000000 | null | NULL | YES | BTREE |
+--------------------------------------------------------------------------------------------------------------------------------+
| userlog | 1 | idindex | 3 | sample | A | 20500000 | null | NULL | YES | BTREE |
+--------------------------------------------------------------------------------------------------------------------------------+
有谁知道它为什么要使用其他索引?
此外,有没有一种方法可以优化 ID 查询,以免这不是问题?
如果我缺少任何信息,我可以更新问题。
编辑:
这是以下答案的更新说明:
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ID | select_type | table | partitions | type | possiblekeys | Keys | key_len | ref | rows | filtered | extra |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1 | primary | t1 | (null)OK | range | *list of indexes | nameindex | 774 | NULL | 500000 | 10.00 | using index condition; using where; using temporary|
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 2 |dependent subquery| t2 | (null)OK | ref | *list of indexes | idonlyindex | 768 | db.t1.id| 892 | 0.61 | using where; |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
注意:idonlyindex 是仅在 id 字段上的索引
解决方案
可能是这样的吗?
SELECT DISTINCT id
FROM userlog
WHERE
( created >= '2019-07-07 00:00:00' - INTERVAL 30 DAY
AND created <= '2019-07-13 23:59:59' - INTERVAL 30 DAY
AND isSample = 0
)
AND name NOT IN
(
SELECT u1.name
FROM userlog as u1
WHERE u1created >= '2019-07-13 23:59:59' - INTERVAL 30 DAY
AND u1created <= '2019-07-13 23:59:59' AND u1.isSample = 0
)
如果您使用name
列来过滤,添加索引会很好。添加括号是覆盖逻辑以独立于第二个逻辑进行处理。
推荐阅读
- react-native - 如何删除 VS 代码调试器?
- python - __doc__ 是 None 与方法但不是与功能
- netlogo - 使用 GIS 扩展在 Netlogo 中计算距离和确定测量单位
- python - 如何选择 Urllib 保存图片的位置?
- c - 使用分叉和线程跟踪程序的技术
- r - R 中对两个数据集的列执行多项操作的函数
- java - java StringBuffer 发送电子邮件某些部分 html 某些部分 java 格式已更新
- scala - 在scalatest中使用什么代替符号?
- sql - 从子查询聚合时如何更正错误
- verilog - 带有“数字符号”的模块实例化