首页 > 解决方案 > 如何优化使用依赖子查询的查询

问题描述

我正在使用此 NOT IN 查询从单个表中返回非活动用户。

SELECT * 
  FROM 
     ( SELECT DISTINCT name
                  FROM userlog 
                 WHERE created >= '2019-07-07 00:00:00' - INTERVAL 30 DAY 
                   AND created <= '2019-07-13 23:59:59' - INTERVAL 30 DAY 
                   AND isSample = 0
     ) inactive 
 WHERE inactive.name NOT IN 
        ( 
     SELECT name AS name 
       FROM userlog 
      WHERE created >= '2019-07-13 23:59:59' - INTERVAL 30 DAY 
        AND created <= '2019-07-13 23:59:59' AND isSample = 0
        )

此查询的描述:

+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ID | select_type |   table   | partitions | type |   possiblekeys   |     Keys     | key_len | ref |  rows  | filtered | extra                                  | 
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+
|  1 |   primary   | <derived2>|  (null)OK  |  ALL |       NULL       |      null    |   NULL  | NULL| 50000  |  100.00  | using where                            |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+
|  3 |   subquery  |  userlog  |  (null)OK  | range| *list of indexes |   nameindex  |   774   | NULL| 1000000|  10.00   | using index condition                  |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+
|  2 |   derived   |  userlog  |  (null)OK  | range| *list of indexes |   nameindex  |   774   | NULL| 500000 |  10.00   | using index condition; using temporary |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+

我不想根据名称查询,因为名称可能会改变,但他们的 ID 永远不会改变,所以我使用 ID 进行查询。我使用相同的查询,我只是更改字段

SELECT * 
  FROM 
     (SELECT DISTINCT(id) AS id
                 FROM userlog 
                 WHERE created >= '2019-07-07 00:00:00' - INTERVAL 30 DAY 
                 AND created <= '2019-07-13 23:59:59' - INTERVAL 30 DAY 
                 AND isSample = '0'
     ) inactive 
  WHERE inactive.id NOT IN 
    (SELECT id AS id
       FROM userlog 
       WHERE created >= '2019-07-13 23:59:59' - INTERVAL 30 DAY 
       AND created <= '2019-07-13 23:59:59' 
       AND isSample = '0')

现在这个查询的描述与上面不同:

+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ID |   select_type    |   table   | partitions |     type     |   possiblekeys   |     Keys     | key_len | ref |  rows  | filtered | extra                                  | 
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|  1 |     primary      | <derived2>|  (null)OK  |      ALL     |       NULL       |      null    |   NULL  | NULL| 50000  |  100.00  | using where                            |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|  3 |dependent subquery|  userlog  |  (null)OK  |index_subquery| *list of indexes |   countindex |   768   | func|   892  |   0.61   | using where; full scan on null key     |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|  2 |     derived      |  userlog  |  (null)OK  |     range    | *list of indexes |    idindex   |   774   | NULL| 500000 |  10.00   | using index condition; using temporary |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

该查询现在使用一个依赖子查询,它正在执行全表扫描,这在我的表上非常慢(20+ 百万条记录)。我注意到 ID 查询没有使用 idindex,但它使用了我的计数索引。如果我将每个查询单独分开,它们都使用 ID 索引,但是当它们组合为 NOT IN 时,使用计数索引。

这是我的索引:

+--------------------------------------------------------------------------------------------------------------------------------+
|  TABLE  | NON_UNIQUE | KEY NAME | SEQ_IN_INDEX | COLUMN_NAME | COLLATION | CARDINALITY | SUB_PART | PACKED | NULL | INDEX_TYPE |
+--------------------------------------------------------------------------------------------------------------------------------+
| userlog |      1     |countindex|       1      |      id     |     A     |    75000    |   255    |  NULL  |  YES |   BTREE    |
+--------------------------------------------------------------------------------------------------------------------------------+
| userlog |      1     |countindex|       2      |      pk     |     A     |  11500000   |   null   |  NULL  |  YES |   BTREE    |
+--------------------------------------------------------------------------------------------------------------------------------+
| userlog |      1     |nameindex |       1      |   created   |     A     |   6800000   |   null   |  NULL  |  YES |   BTREE    |
+--------------------------------------------------------------------------------------------------------------------------------+
| userlog |      1     |nameindex |       2      |    sample   |     A     |  13500000   |   null   |  NULL  |  YES |   BTREE    |
+--------------------------------------------------------------------------------------------------------------------------------+
| userlog |      1     |nameindex |       3      |    name     |     A     |   24000000  |   null   |  NULL  |  YES |   BTREE    |
+--------------------------------------------------------------------------------------------------------------------------------+
| userlog |      1     | idindex  |       1      |      id     |     A     |    75000    |    512   |  NULL  |  YES |   BTREE    |
+--------------------------------------------------------------------------------------------------------------------------------+
| userlog |      1     | idindex  |       2      |   created   |     A     |   22000000  |   null   |  NULL  |  YES |   BTREE    |
+--------------------------------------------------------------------------------------------------------------------------------+
| userlog |      1     | idindex  |       3      |   sample    |     A     |   20500000  |   null   |  NULL  |  YES |   BTREE    |
+--------------------------------------------------------------------------------------------------------------------------------+

有谁知道它为什么要使用其他索引?

此外,有没有一种方法可以优化 ID 查询,以免这不是问题?

如果我缺少任何信息,我可以更新问题。

编辑:

这是以下答案的更新说明:

+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ID |   select_type    |   table   | partitions |     type     |   possiblekeys   |     Keys     | key_len |   ref   |  rows  | filtered | extra                                              | 
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|  1 |     primary      |     t1    |  (null)OK  |     range    | *list of indexes |  nameindex   |   774   |   NULL  | 500000 |   10.00  | using index condition; using where; using temporary|
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|  2 |dependent subquery|     t2    |  (null)OK  |     ref      | *list of indexes |  idonlyindex |   768   | db.t1.id|   892  |   0.61   | using where;                                       |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

注意:idonlyindex 是仅在 id 字段上的索引

标签: mysql

解决方案


可能是这样的吗?

SELECT DISTINCT  id
 FROM userlog 
    WHERE 
        (  created >= '2019-07-07 00:00:00' - INTERVAL 30 DAY 
                 AND created <= '2019-07-13 23:59:59' - INTERVAL 30 DAY 
                 AND isSample = 0
         )

    AND    name NOT IN 
         ( 
             SELECT u1.name  
              FROM userlog as u1
             WHERE u1created >= '2019-07-13 23:59:59' - INTERVAL 30 DAY 
                AND u1created <= '2019-07-13 23:59:59' AND u1.isSample = 0
         )


如果您使用name列来过滤,添加索引会很好。添加括号是覆盖逻辑以独立于第二个逻辑进行处理。


推荐阅读