首页 > 解决方案 > MYSQL:使用 group by 和 inner join 优化 MySQL 查询

问题描述

我有一张有 19489578 个帖子的表格。当我使用 select 语句查询表时,按张贴表的 lockroomId 字段分组,它的查询时间非常慢(> 3 秒)。这是故事表的架构:

CREATE TABLE `Story` (
  `storyId` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `storyType` int(11) DEFAULT '0',
  `parentStoryId` bigint(20) unsigned DEFAULT NULL,
  `ownerId` bigint(20) unsigned NOT NULL,
  `lockroomId` bigint(20) unsigned DEFAULT NULL,
  `isHost` tinyint(1) DEFAULT '0',
  `updatedAt` timestamp(6) NULL DEFAULT CURRENT_TIMESTAMP(6),
  `createdAt` timestamp(6) NULL DEFAULT CURRENT_TIMESTAMP(6),
  `message` text CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci,
  `imageUrl` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `amount` int(11) DEFAULT NULL,
  `location` varchar(45) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `privacy` tinyint(1) NOT NULL DEFAULT '0',
  `minutes` int(11) DEFAULT '0',
  `hasCoinBadge` tinyint(1) DEFAULT '0',
  `hasFriendBadge` tinyint(1) DEFAULT '0',
  `localDBId` text CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci,
  `tagName` varchar(63) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT '',
  `tagColor` varchar(15) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT '',
  `tagId` bigint(20) DEFAULT NULL,
  `numLikes` int(11) DEFAULT '0',
  `numComments` int(11) DEFAULT '0',
  `lastCommenterId` bigint(20) unsigned DEFAULT NULL,
  `lastCommentTimestamp` timestamp(6) NULL DEFAULT CURRENT_TIMESTAMP(6),
  `lastComment` varchar(256) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `secondLastCommenterId` bigint(20) unsigned DEFAULT NULL,
  `secondLastCommentTimestamp` timestamp(6) NULL DEFAULT CURRENT_TIMESTAMP(6),
  `secondLastComment` varchar(256) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `lastLikerId` bigint(20) unsigned DEFAULT NULL,
  PRIMARY KEY (`storyId`),
  UNIQUE KEY `StoryId_UNIQUE` (`storyId`),
  KEY `ownerId` (`ownerId`),
  KEY `parentStoryId` (`parentStoryId`),
  KEY `updatedAt` (`updatedAt`),
  KEY `ownerId_updatedAt` (`ownerId`,`updatedAt`),
  KEY `updatedAt_ownerId_descending_index` (`updatedAt` DESC,`ownerId`),
  KEY `lockroomId` (`lockroomId`),
  CONSTRAINT `ownerId` FOREIGN KEY (`ownerId`) REFERENCES `User` (`userId`),
  CONSTRAINT `parentStoryId` FOREIGN KEY (`parentStoryId`) REFERENCES `Story` (`storyId`)
) ENGINE=InnoDB AUTO_INCREMENT=19503337 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
+----------------------------+---------------------+------+-----+----------------------+-------------------+
| Field                      | Type                | Null | Key | Default              | Extra             |
+----------------------------+---------------------+------+-----+----------------------+-------------------+
| storyId                    | bigint(20) unsigned | NO   | PRI | NULL                 | auto_increment    |
| storyType                  | int(11)             | YES  |     | 0                    |                   |
| parentStoryId              | bigint(20) unsigned | YES  | MUL | NULL                 |                   |
| ownerId                    | bigint(20) unsigned | NO   | MUL | NULL                 |                   |
| lockroomId                 | bigint(20) unsigned | YES  | MUL | NULL                 |                   |
| isHost                     | tinyint(1)          | YES  |     | 0                    |                   |
| updatedAt                  | timestamp(6)        | YES  | MUL | CURRENT_TIMESTAMP(6) | DEFAULT_GENERATED |
| createdAt                  | timestamp(6)        | YES  |     | CURRENT_TIMESTAMP(6) | DEFAULT_GENERATED |
| message                    | text                | YES  |     | NULL                 |                   |
| imageUrl                   | varchar(255)        | YES  |     | NULL                 |                   |
| amount                     | int(11)             | YES  |     | NULL                 |                   |
| location                   | varchar(45)         | YES  |     | NULL                 |                   |
| privacy                    | tinyint(1)          | NO   |     | 0                    |                   |
| minutes                    | int(11)             | YES  |     | 0                    |                   |
| hasCoinBadge               | tinyint(1)          | YES  |     | 0                    |                   |
| hasFriendBadge             | tinyint(1)          | YES  |     | 0                    |                   |
| localDBId                  | text                | YES  |     | NULL                 |                   |
| tagName                    | varchar(63)         | YES  |     |                      |                   |
| tagColor                   | varchar(15)         | YES  |     |                      |                   |
| tagId                      | bigint(20)          | YES  |     | NULL                 |                   |
| numLikes                   | int(11)             | YES  |     | 0                    |                   |
| numComments                | int(11)             | YES  |     | 0                    |                   |
| lastCommenterId            | bigint(20) unsigned | YES  |     | NULL                 |                   |
| lastCommentTimestamp       | timestamp(6)        | YES  |     | CURRENT_TIMESTAMP(6) | DEFAULT_GENERATED |
| lastComment                | varchar(256)        | YES  |     | NULL                 |                   |
| secondLastCommenterId      | bigint(20) unsigned | YES  |     | NULL                 |                   |
| secondLastCommentTimestamp | timestamp(6)        | YES  |     | CURRENT_TIMESTAMP(6) | DEFAULT_GENERATED |
| secondLastComment          | varchar(256)        | YES  |     | NULL                 |                   |
| lastLikerId                | bigint(20) unsigned | YES  |     | NULL                 |                   |
+----------------------------+---------------------+------+-----+----------------------+-------------------+

我还将 lockroomId 添加为索引。这是故事表的索引:

+-------+------------+------------------------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table | Non_unique | Key_name                           | Seq_in_index | Column_name   | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+-------+------------+------------------------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Story |          0 | PRIMARY                            |            1 | storyId       | A         |    18951234 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
| Story |          0 | StoryId_UNIQUE                     |            1 | storyId       | A         |    19116608 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
| Story |          1 | ownerId                            |            1 | ownerId       | A         |     1212686 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
| Story |          1 | parentStoryId                      |            1 | parentStoryId | A         |           1 |     NULL |   NULL | YES  | BTREE      |         |               | YES     | NULL       |
| Story |          1 | updatedAt                          |            1 | updatedAt     | A         |    19022172 |     NULL |   NULL | YES  | BTREE      |         |               | YES     | NULL       |
| Story |          1 | ownerId_updatedAt                  |            1 | ownerId       | A         |      945172 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
| Story |          1 | ownerId_updatedAt                  |            2 | updatedAt     | A         |    19112336 |     NULL |   NULL | YES  | BTREE      |         |               | YES     | NULL       |
| Story |          1 | updatedAt_ownerId_descending_index |            1 | updatedAt     | D         |    19091560 |     NULL |   NULL | YES  | BTREE      |         |               | YES     | NULL       |
| Story |          1 | updatedAt_ownerId_descending_index |            2 | ownerId       | A         |    18649716 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
| Story |          1 | lockroomId                         |            1 | lockroomId    | A         |           1 |     NULL |   NULL | YES  | BTREE      |         |               | YES     | NULL       |
+-------+------------+------------------------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+

这是执行时间超过 3 秒的查询。该查询的作用是获取用户创建的jointStory 与相同的lockroomId 但不同的storyId。因此,我按 lockroomId 对故事进行分组。

SELECT 
    *
FROM
    ((SELECT 
        Story.*,
            jointStory.guestIds,
            jointStory.guestStoryIds,
            jointStory.guestTrees,
            jointStory.guestDurations,
            jointStory.guestTagIds
    FROM
        Story
    INNER JOIN (SELECT 
        lockroomId,
            GROUP_CONCAT(Story.ownerId) AS guestIds,
            GROUP_CONCAT(Story.storyId) AS guestStoryIds,
            GROUP_CONCAT(Story.imageUrl) AS guestTrees,
            GROUP_CONCAT(Story.minutes) AS guestDurations,
            GROUP_CONCAT(Story.tagId) AS guestTagIds
    FROM
        Story
    WHERE
        Story.storyType = 1
            AND lockroomId IS NOT NULL
            AND createdAt < FROM_UNIXTIME(1600444809)
    GROUP BY Story.lockroomId
    HAVING FIND_IN_SET(1349147, guestIds) ORDER BY createdAt DESC
    LIMIT 10) jointStory ON jointStory.lockroomId = Story.lockroomId
    WHERE
        Story.storyType = 2
            AND Story.lockroomId IS NOT NULL
            AND Story.createdAt < FROM_UNIXTIME(1600444809)
    ORDER BY createdAt DESC
    LIMIT 10) UNION ALL SELECT 
        Story.*, NULL, NULL, NULL, NULL, NULL
    FROM
        Story
    WHERE
        Story.storyType = 0
            AND ownerId = 1349147
            AND createdAt < FROM_UNIXTIME(1600444809)
    ORDER BY createdAt DESC
    LIMIT 10) stories
        LEFT JOIN
    StoryEngagement ON StoryEngagement.storyId = stories.storyId
        AND StoryEngagement.userId = 1349147
ORDER BY createdAt DESC
LIMIT 10;

对该查询的解释扩展命令的结果表明 MySQL 正在使用文件排序和使用索引条件:

+----+-------------+------------+------------+-------+---------------+-------------+---------+-----------------------------+--------+----------+---------------------------------------------------------------+
| id | select_type | table      | partitions | type  | possible_keys | key         | key_len | ref                         | rows   | filtered | Extra                                                         |
+----+-------------+------------+------------+-------+---------------+-------------+---------+-----------------------------+--------+----------+---------------------------------------------------------------+
|  1 | PRIMARY     | Story      | NULL       | range | lockroomId    | lockroomId  | 9       | NULL                        | 287540 |     3.33 | Using index condition; Using where; Using MRR; Using filesort |
|  1 | PRIMARY     | <derived2> | NULL       | ref   | <auto_key0>   | <auto_key0> | 9       | flora_test.Story.lockroomId |     10 |   100.00 | NULL                                                          |
|  2 | DERIVED     | Story      | NULL       | range | lockroomId    | lockroomId  | 9       | NULL                        | 287540 |     3.33 | Using index condition; Using where                            |
+----+-------------+------------+------------+-------+---------------+-------------+---------+-----------------------------+--------+----------+---------------------------------------------------------------+

如果您对如何优化此查询有任何想法,将不胜感激。如果有一种方法不需要创建索引,让这个查询运行得更快,那就太好了,因为这是一个巨大的生产表,添加索引需要很长时间,而且添加一些新的索引可能会导致其他查询运行更慢。我已经尝试了一些事情,例如在按字段分组(index(createdAt,lockroomId))上创建组合索引并FORCE INDEX(createdAt,lockroomId)按照一些博客帖子的建议使用,但运行速度更慢......

更新

应用建议后,结果如下EXPLAIN SELECE

+----+--------------+-----------------+------------+--------+------------------------------------------------------------------------+--------------------------+---------+-----------------------------+-------+----------+---------------------------------------------------------------------+
| id | select_type  | table           | partitions | type   | possible_keys                                                          | key                      | key_len | ref                         | rows  | filtered | Extra                                                               |
+----+--------------+-----------------+------------+--------+------------------------------------------------------------------------+--------------------------+---------+-----------------------------+-------+----------+---------------------------------------------------------------------+
|  1 | PRIMARY      | <derived2>      | NULL       | ALL    | NULL                                                                   | NULL                     | NULL    | NULL                        |    12 |   100.00 | Using filesort                                                      |
|  1 | PRIMARY      | StoryEngagement | NULL       | eq_ref | PRIMARY                                                                | PRIMARY                  | 16      | const,stories.storyId       |     1 |   100.00 | Using where                                                         |
|  2 | DERIVED      | <derived3>      | NULL       | ALL    | NULL                                                                   | NULL                     | NULL    | NULL                        |    10 |   100.00 | Using where; Using temporary; Using filesort                        |
|  2 | DERIVED      | Story           | NULL       | ref    | lockroomId,combine_lockroomId_Index,combine_story_index                | combine_story_index      | 14      | const,jointStory.lockroomId |    29 |    33.33 | Using index condition                                               |
|  3 | DERIVED      | Story           | NULL       | range  | lockroomId,combine_lockroomId_Index,combine_story_index                | combine_lockroomId_Index | 14      | NULL                        | 36332 |    33.33 | Using index condition; Using where; Using temporary; Using filesort |
|  4 | UNION        | Story           | NULL       | ref    | ownerId,ownerId_updatedAt,combine_lockroomId_Index,combine_story_index | ownerId                  | 8       | const                       |    15 |    16.66 | Using where                                                         |
| NULL | UNION RESULT | <union2,4>      | NULL       | ALL    | NULL                                                                   | NULL                     | NULL    | NULL                        |  NULL |     NULL | Using temporary; Using filesort                                     |
+----+--------------+-----------------+------------+--------+------------------------------------------------------------------------+--------------------------+---------+-----------------------------+-------+----------+---------------------------------------------------------------------+

这是结果SHOW TABLE STATUS

+-------+--------+---------+------------+----------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+--------------------+----------+----------------+---------+
| Name  | Engine | Version | Row_format | Rows     | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time         | Update_time | Check_time | Collation          | Checksum | Create_options | Comment |
+-------+--------+---------+------------+----------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+--------------------+----------+----------------+---------+
| Story | InnoDB |      10 | Dynamic    | 19350090 |            201 |  3895443456 |               0 |   4337303552 |   4194304 |       20043428 | 2020-09-18 17:03:14 | NULL        | NULL       | utf8mb4_unicode_ci |     NULL |                |         |
+-------+--------+---------+------------+----------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+--------------------+----------+----------------+---------+

RAM 的大小为 2 GB。这里是innodb_buffer_pool_size

+------------------------------------------+
| @@innodb_buffer_pool_size/1024/1024/1024 |
+------------------------------------------+
|                           1.000000000000 |
+------------------------------------------+

更改列的类型可能是个好主意,但是据我所知,更改列的类型时表将被锁定。因此,我可能需要等到下一次数据库更新。

这是当前的索引集:

+-------+------------+------------------------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table | Non_unique | Key_name                           | Seq_in_index | Column_name   | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+-------+------------+------------------------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Story |          0 | PRIMARY                            |            1 | storyId       | A         |    19350090 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
| Story |          0 | StoryId_UNIQUE                     |            1 | storyId       | A         |    19350090 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
| Story |          1 | ownerId                            |            1 | ownerId       | A         |     1255716 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
| Story |          1 | parentStoryId                      |            1 | parentStoryId | A         |           1 |     NULL |   NULL | YES  | BTREE      |         |               | YES     | NULL       |
| Story |          1 | updatedAt                          |            1 | updatedAt     | A         |    19350090 |     NULL |   NULL | YES  | BTREE      |         |               | YES     | NULL       |
| Story |          1 | ownerId_updatedAt                  |            1 | ownerId       | A         |     1743714 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
| Story |          1 | ownerId_updatedAt                  |            2 | updatedAt     | A         |    19350090 |     NULL |   NULL | YES  | BTREE      |         |               | YES     | NULL       |
| Story |          1 | updatedAt_ownerId_descending_index |            1 | updatedAt     | D         |    19139564 |     NULL |   NULL | YES  | BTREE      |         |               | YES     | NULL       |
| Story |          1 | updatedAt_ownerId_descending_index |            2 | ownerId       | A         |    18636190 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
| Story |          1 | lockroomId                         |            1 | lockroomId    | A         |      328860 |     NULL |   NULL | YES  | BTREE      |         |               | YES     | NULL       |
| Story |          1 | combine_lockroomId_Index           |            1 | storyType     | A         |         685 |     NULL |   NULL | YES  | BTREE      |         |               | YES     | NULL       |
| Story |          1 | combine_lockroomId_Index           |            2 | lockroomId    | A         |      221470 |     NULL |   NULL | YES  | BTREE      |         |               | YES     | NULL       |
| Story |          1 | combine_story_index                |            1 | storyType     | A         |         429 |     NULL |   NULL | YES  | BTREE      |         |               | YES     | NULL       |
| Story |          1 | combine_story_index                |            2 | lockroomId    | A         |      665208 |     NULL |   NULL | YES  | BTREE      |         |               | YES     | NULL       |
| Story |          1 | combine_story_index                |            3 | createdAt     | A         |    19350090 |     NULL |   NULL | YES  | BTREE      |         |               | YES     | NULL       |
| Story |          1 | combine_story_index                |            4 | ownerId       | A         |    19350090 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
+-------+------------+------------------------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+

Like 的数量只是存储在列中的数字(Int)。以及只存储lastComment 和secondLastComment 的comment 栏。我们创建了另一个表Story来存储所有评论。

updatedAt并且createdAt是检查用户何时发布故事。

非常感谢您的回复。

标签: mysqlsqlquery-optimization

解决方案


我认为这lockroomId IS NOT NULL是不必要的,因为lockroomId > 0

索引:

(storyType, lockroomId)
(storyType, createdAt)

冗余(因为PRIMARY KEY是 a UNIQUE KEY):

UNIQUE KEY `StoryId_UNIQUE` (`storyId`),

每个lockroomId都有 a storyType= 1 和 = 2 吗?如果是这样,我想分 3 步(而不是 2 步)执行此查询:

  1. 做最少的工作来找到 10 个lookroomID。
  2. JOIN获得的故事Story.*
  3. JOIN再次得到其余的GROUP_CONCATs

这是理由。

首先,您要获取整个表 building GROUP_CONCATs,但最终将丢弃除 10 之外的所有表。相反,我希望只获得 guestIds 并过滤 . 其次,将其减少到 10 行。

然后我会回去拿剩下的东西。

这是否获得了正确的 lookroomIds 列表?

SELECT  lockroomId, GROUP_CONCAT(Story.ownerId) AS guestIds
    FROM  Story
    WHERE  storyType = 1
      AND  lockroomId > 0
       AND  createdAt < FROM_UNIXTIME(1598882406)
    GROUP BY  lockroomId
    HAVING  FIND_IN_SET(<userId>, guestIds)
    ORDER BY  createdAt DESC
    LIMIT  10;

有了这个覆盖指数:

INDEX(storyType, lookroomId, createdAt, ownerId)

在我继续之前,请查看它的运行速度以及它是否获得了有效的 lockroomId 列表。

改进

你说这有帮助。请EXPLAIN SELECT在索引更改后提供。

次要建议

你的桌子很大。请提供SHOW TABLE STATUS、 RAM 大小和innodb_buffer_pool_size. 如果可能,请检查 MySQL 是否正在交换。这将提供您是否受 I/O 限制的线索,从而导致焦点转移。

我看到很多BIGINTsBIGINT占用 8 个字节;INT需要 4。如果涉及 I/O,缩小数据大小有助于提高性能。

你总是需要显示“评论”吗?也许这 6 列可以移动到单独的表中?

将“numlikes”移动到另一个表——这可能是有益的,特别是如果您在进行有问题的大选择时经常更新该列。

考虑较小的数据类型,例如minutes. SMALLINT UNSIGNED能坚持2个月吗?(只有 2 个字节。)

你真的使用created_atandupdated_at吗?(`TIMESTAMP(6) 每个占用 8 个字节。)

当前的索引集是什么?让我们看看第二个最差的查询。(我不想在改进当前查询的同时破坏它。)

更多的

innodb_buffer_pool_size = 1G只有 2GB 的内存——听起来你可能正在交换。交换对性能非常不利。我建议将该设置降低到 500M。

60-80% 的建议适用于大型服务器。操作系统需要一些空间。MySQL 代码需要一些空间。其他数据结构占用一些空间。和网络服务器。以及您的应用程序代码。这些可能加起来超过 1GB。(我不确切知道。)这为缓冲池留下了不到 1GB 的空间。设置为500M,运行几天,如果有空闲RAM,可以增加buffer_pool一些。

与较小的 buffer_pool 相比,交换性能更差。

由于表大约 8GB,因此缩小表很重要。请参阅上面的建议。

由于combine_lockroomId_Indexcombine_story_indexDROP的第一部分combine_lockroomId_Index


推荐阅读