mysql - MYSQL:使用 group by 和 inner join 优化 MySQL 查询
问题描述
我有一张有 19489578 个帖子的表格。当我使用 select 语句查询表时,按张贴表的 lockroomId 字段分组,它的查询时间非常慢(> 3 秒)。这是故事表的架构:
CREATE TABLE `Story` (
`storyId` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`storyType` int(11) DEFAULT '0',
`parentStoryId` bigint(20) unsigned DEFAULT NULL,
`ownerId` bigint(20) unsigned NOT NULL,
`lockroomId` bigint(20) unsigned DEFAULT NULL,
`isHost` tinyint(1) DEFAULT '0',
`updatedAt` timestamp(6) NULL DEFAULT CURRENT_TIMESTAMP(6),
`createdAt` timestamp(6) NULL DEFAULT CURRENT_TIMESTAMP(6),
`message` text CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci,
`imageUrl` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`amount` int(11) DEFAULT NULL,
`location` varchar(45) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`privacy` tinyint(1) NOT NULL DEFAULT '0',
`minutes` int(11) DEFAULT '0',
`hasCoinBadge` tinyint(1) DEFAULT '0',
`hasFriendBadge` tinyint(1) DEFAULT '0',
`localDBId` text CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci,
`tagName` varchar(63) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT '',
`tagColor` varchar(15) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT '',
`tagId` bigint(20) DEFAULT NULL,
`numLikes` int(11) DEFAULT '0',
`numComments` int(11) DEFAULT '0',
`lastCommenterId` bigint(20) unsigned DEFAULT NULL,
`lastCommentTimestamp` timestamp(6) NULL DEFAULT CURRENT_TIMESTAMP(6),
`lastComment` varchar(256) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`secondLastCommenterId` bigint(20) unsigned DEFAULT NULL,
`secondLastCommentTimestamp` timestamp(6) NULL DEFAULT CURRENT_TIMESTAMP(6),
`secondLastComment` varchar(256) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`lastLikerId` bigint(20) unsigned DEFAULT NULL,
PRIMARY KEY (`storyId`),
UNIQUE KEY `StoryId_UNIQUE` (`storyId`),
KEY `ownerId` (`ownerId`),
KEY `parentStoryId` (`parentStoryId`),
KEY `updatedAt` (`updatedAt`),
KEY `ownerId_updatedAt` (`ownerId`,`updatedAt`),
KEY `updatedAt_ownerId_descending_index` (`updatedAt` DESC,`ownerId`),
KEY `lockroomId` (`lockroomId`),
CONSTRAINT `ownerId` FOREIGN KEY (`ownerId`) REFERENCES `User` (`userId`),
CONSTRAINT `parentStoryId` FOREIGN KEY (`parentStoryId`) REFERENCES `Story` (`storyId`)
) ENGINE=InnoDB AUTO_INCREMENT=19503337 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
+----------------------------+---------------------+------+-----+----------------------+-------------------+
| Field | Type | Null | Key | Default | Extra |
+----------------------------+---------------------+------+-----+----------------------+-------------------+
| storyId | bigint(20) unsigned | NO | PRI | NULL | auto_increment |
| storyType | int(11) | YES | | 0 | |
| parentStoryId | bigint(20) unsigned | YES | MUL | NULL | |
| ownerId | bigint(20) unsigned | NO | MUL | NULL | |
| lockroomId | bigint(20) unsigned | YES | MUL | NULL | |
| isHost | tinyint(1) | YES | | 0 | |
| updatedAt | timestamp(6) | YES | MUL | CURRENT_TIMESTAMP(6) | DEFAULT_GENERATED |
| createdAt | timestamp(6) | YES | | CURRENT_TIMESTAMP(6) | DEFAULT_GENERATED |
| message | text | YES | | NULL | |
| imageUrl | varchar(255) | YES | | NULL | |
| amount | int(11) | YES | | NULL | |
| location | varchar(45) | YES | | NULL | |
| privacy | tinyint(1) | NO | | 0 | |
| minutes | int(11) | YES | | 0 | |
| hasCoinBadge | tinyint(1) | YES | | 0 | |
| hasFriendBadge | tinyint(1) | YES | | 0 | |
| localDBId | text | YES | | NULL | |
| tagName | varchar(63) | YES | | | |
| tagColor | varchar(15) | YES | | | |
| tagId | bigint(20) | YES | | NULL | |
| numLikes | int(11) | YES | | 0 | |
| numComments | int(11) | YES | | 0 | |
| lastCommenterId | bigint(20) unsigned | YES | | NULL | |
| lastCommentTimestamp | timestamp(6) | YES | | CURRENT_TIMESTAMP(6) | DEFAULT_GENERATED |
| lastComment | varchar(256) | YES | | NULL | |
| secondLastCommenterId | bigint(20) unsigned | YES | | NULL | |
| secondLastCommentTimestamp | timestamp(6) | YES | | CURRENT_TIMESTAMP(6) | DEFAULT_GENERATED |
| secondLastComment | varchar(256) | YES | | NULL | |
| lastLikerId | bigint(20) unsigned | YES | | NULL | |
+----------------------------+---------------------+------+-----+----------------------+-------------------+
我还将 lockroomId 添加为索引。这是故事表的索引:
+-------+------------+------------------------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+-------+------------+------------------------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Story | 0 | PRIMARY | 1 | storyId | A | 18951234 | NULL | NULL | | BTREE | | | YES | NULL |
| Story | 0 | StoryId_UNIQUE | 1 | storyId | A | 19116608 | NULL | NULL | | BTREE | | | YES | NULL |
| Story | 1 | ownerId | 1 | ownerId | A | 1212686 | NULL | NULL | | BTREE | | | YES | NULL |
| Story | 1 | parentStoryId | 1 | parentStoryId | A | 1 | NULL | NULL | YES | BTREE | | | YES | NULL |
| Story | 1 | updatedAt | 1 | updatedAt | A | 19022172 | NULL | NULL | YES | BTREE | | | YES | NULL |
| Story | 1 | ownerId_updatedAt | 1 | ownerId | A | 945172 | NULL | NULL | | BTREE | | | YES | NULL |
| Story | 1 | ownerId_updatedAt | 2 | updatedAt | A | 19112336 | NULL | NULL | YES | BTREE | | | YES | NULL |
| Story | 1 | updatedAt_ownerId_descending_index | 1 | updatedAt | D | 19091560 | NULL | NULL | YES | BTREE | | | YES | NULL |
| Story | 1 | updatedAt_ownerId_descending_index | 2 | ownerId | A | 18649716 | NULL | NULL | | BTREE | | | YES | NULL |
| Story | 1 | lockroomId | 1 | lockroomId | A | 1 | NULL | NULL | YES | BTREE | | | YES | NULL |
+-------+------------+------------------------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
这是执行时间超过 3 秒的查询。该查询的作用是获取用户创建的jointStory 与相同的lockroomId 但不同的storyId。因此,我按 lockroomId 对故事进行分组。
SELECT
*
FROM
((SELECT
Story.*,
jointStory.guestIds,
jointStory.guestStoryIds,
jointStory.guestTrees,
jointStory.guestDurations,
jointStory.guestTagIds
FROM
Story
INNER JOIN (SELECT
lockroomId,
GROUP_CONCAT(Story.ownerId) AS guestIds,
GROUP_CONCAT(Story.storyId) AS guestStoryIds,
GROUP_CONCAT(Story.imageUrl) AS guestTrees,
GROUP_CONCAT(Story.minutes) AS guestDurations,
GROUP_CONCAT(Story.tagId) AS guestTagIds
FROM
Story
WHERE
Story.storyType = 1
AND lockroomId IS NOT NULL
AND createdAt < FROM_UNIXTIME(1600444809)
GROUP BY Story.lockroomId
HAVING FIND_IN_SET(1349147, guestIds) ORDER BY createdAt DESC
LIMIT 10) jointStory ON jointStory.lockroomId = Story.lockroomId
WHERE
Story.storyType = 2
AND Story.lockroomId IS NOT NULL
AND Story.createdAt < FROM_UNIXTIME(1600444809)
ORDER BY createdAt DESC
LIMIT 10) UNION ALL SELECT
Story.*, NULL, NULL, NULL, NULL, NULL
FROM
Story
WHERE
Story.storyType = 0
AND ownerId = 1349147
AND createdAt < FROM_UNIXTIME(1600444809)
ORDER BY createdAt DESC
LIMIT 10) stories
LEFT JOIN
StoryEngagement ON StoryEngagement.storyId = stories.storyId
AND StoryEngagement.userId = 1349147
ORDER BY createdAt DESC
LIMIT 10;
对该查询的解释扩展命令的结果表明 MySQL 正在使用文件排序和使用索引条件:
+----+-------------+------------+------------+-------+---------------+-------------+---------+-----------------------------+--------+----------+---------------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+-------+---------------+-------------+---------+-----------------------------+--------+----------+---------------------------------------------------------------+
| 1 | PRIMARY | Story | NULL | range | lockroomId | lockroomId | 9 | NULL | 287540 | 3.33 | Using index condition; Using where; Using MRR; Using filesort |
| 1 | PRIMARY | <derived2> | NULL | ref | <auto_key0> | <auto_key0> | 9 | flora_test.Story.lockroomId | 10 | 100.00 | NULL |
| 2 | DERIVED | Story | NULL | range | lockroomId | lockroomId | 9 | NULL | 287540 | 3.33 | Using index condition; Using where |
+----+-------------+------------+------------+-------+---------------+-------------+---------+-----------------------------+--------+----------+---------------------------------------------------------------+
如果您对如何优化此查询有任何想法,将不胜感激。如果有一种方法不需要创建索引,让这个查询运行得更快,那就太好了,因为这是一个巨大的生产表,添加索引需要很长时间,而且添加一些新的索引可能会导致其他查询运行更慢。我已经尝试了一些事情,例如在按字段分组(index(createdAt,lockroomId)
)上创建组合索引并FORCE INDEX(createdAt,lockroomId)
按照一些博客帖子的建议使用,但运行速度更慢......
更新
应用建议后,结果如下EXPLAIN SELECE
+----+--------------+-----------------+------------+--------+------------------------------------------------------------------------+--------------------------+---------+-----------------------------+-------+----------+---------------------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------+-----------------+------------+--------+------------------------------------------------------------------------+--------------------------+---------+-----------------------------+-------+----------+---------------------------------------------------------------------+
| 1 | PRIMARY | <derived2> | NULL | ALL | NULL | NULL | NULL | NULL | 12 | 100.00 | Using filesort |
| 1 | PRIMARY | StoryEngagement | NULL | eq_ref | PRIMARY | PRIMARY | 16 | const,stories.storyId | 1 | 100.00 | Using where |
| 2 | DERIVED | <derived3> | NULL | ALL | NULL | NULL | NULL | NULL | 10 | 100.00 | Using where; Using temporary; Using filesort |
| 2 | DERIVED | Story | NULL | ref | lockroomId,combine_lockroomId_Index,combine_story_index | combine_story_index | 14 | const,jointStory.lockroomId | 29 | 33.33 | Using index condition |
| 3 | DERIVED | Story | NULL | range | lockroomId,combine_lockroomId_Index,combine_story_index | combine_lockroomId_Index | 14 | NULL | 36332 | 33.33 | Using index condition; Using where; Using temporary; Using filesort |
| 4 | UNION | Story | NULL | ref | ownerId,ownerId_updatedAt,combine_lockroomId_Index,combine_story_index | ownerId | 8 | const | 15 | 16.66 | Using where |
| NULL | UNION RESULT | <union2,4> | NULL | ALL | NULL | NULL | NULL | NULL | NULL | NULL | Using temporary; Using filesort |
+----+--------------+-----------------+------------+--------+------------------------------------------------------------------------+--------------------------+---------+-----------------------------+-------+----------+---------------------------------------------------------------------+
这是结果SHOW TABLE STATUS
+-------+--------+---------+------------+----------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+--------------------+----------+----------------+---------+
| Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time | Update_time | Check_time | Collation | Checksum | Create_options | Comment |
+-------+--------+---------+------------+----------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+--------------------+----------+----------------+---------+
| Story | InnoDB | 10 | Dynamic | 19350090 | 201 | 3895443456 | 0 | 4337303552 | 4194304 | 20043428 | 2020-09-18 17:03:14 | NULL | NULL | utf8mb4_unicode_ci | NULL | | |
+-------+--------+---------+------------+----------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+--------------------+----------+----------------+---------+
RAM 的大小为 2 GB。这里是innodb_buffer_pool_size
+------------------------------------------+
| @@innodb_buffer_pool_size/1024/1024/1024 |
+------------------------------------------+
| 1.000000000000 |
+------------------------------------------+
更改列的类型可能是个好主意,但是据我所知,更改列的类型时表将被锁定。因此,我可能需要等到下一次数据库更新。
这是当前的索引集:
+-------+------------+------------------------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+-------+------------+------------------------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Story | 0 | PRIMARY | 1 | storyId | A | 19350090 | NULL | NULL | | BTREE | | | YES | NULL |
| Story | 0 | StoryId_UNIQUE | 1 | storyId | A | 19350090 | NULL | NULL | | BTREE | | | YES | NULL |
| Story | 1 | ownerId | 1 | ownerId | A | 1255716 | NULL | NULL | | BTREE | | | YES | NULL |
| Story | 1 | parentStoryId | 1 | parentStoryId | A | 1 | NULL | NULL | YES | BTREE | | | YES | NULL |
| Story | 1 | updatedAt | 1 | updatedAt | A | 19350090 | NULL | NULL | YES | BTREE | | | YES | NULL |
| Story | 1 | ownerId_updatedAt | 1 | ownerId | A | 1743714 | NULL | NULL | | BTREE | | | YES | NULL |
| Story | 1 | ownerId_updatedAt | 2 | updatedAt | A | 19350090 | NULL | NULL | YES | BTREE | | | YES | NULL |
| Story | 1 | updatedAt_ownerId_descending_index | 1 | updatedAt | D | 19139564 | NULL | NULL | YES | BTREE | | | YES | NULL |
| Story | 1 | updatedAt_ownerId_descending_index | 2 | ownerId | A | 18636190 | NULL | NULL | | BTREE | | | YES | NULL |
| Story | 1 | lockroomId | 1 | lockroomId | A | 328860 | NULL | NULL | YES | BTREE | | | YES | NULL |
| Story | 1 | combine_lockroomId_Index | 1 | storyType | A | 685 | NULL | NULL | YES | BTREE | | | YES | NULL |
| Story | 1 | combine_lockroomId_Index | 2 | lockroomId | A | 221470 | NULL | NULL | YES | BTREE | | | YES | NULL |
| Story | 1 | combine_story_index | 1 | storyType | A | 429 | NULL | NULL | YES | BTREE | | | YES | NULL |
| Story | 1 | combine_story_index | 2 | lockroomId | A | 665208 | NULL | NULL | YES | BTREE | | | YES | NULL |
| Story | 1 | combine_story_index | 3 | createdAt | A | 19350090 | NULL | NULL | YES | BTREE | | | YES | NULL |
| Story | 1 | combine_story_index | 4 | ownerId | A | 19350090 | NULL | NULL | | BTREE | | | YES | NULL |
+-------+------------+------------------------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
Like 的数量只是存储在列中的数字(Int)。以及只存储lastComment 和secondLastComment 的comment 栏。我们创建了另一个表Story
来存储所有评论。
updatedAt
并且createdAt
是检查用户何时发布故事。
非常感谢您的回复。
解决方案
我认为这lockroomId IS NOT NULL
是不必要的,因为lockroomId > 0
索引:
(storyType, lockroomId)
(storyType, createdAt)
冗余(因为PRIMARY KEY
是 a UNIQUE KEY
):
UNIQUE KEY `StoryId_UNIQUE` (`storyId`),
每个lockroomId
都有 a storyType
= 1 和 = 2 吗?如果是这样,我想分 3 步(而不是 2 步)执行此查询:
- 做最少的工作来找到 10 个lookroomID。
JOIN
获得的故事Story.*
JOIN
再次得到其余的GROUP_CONCATs
这是理由。
首先,您要获取整个表 building GROUP_CONCATs
,但最终将丢弃除 10 之外的所有表。相反,我希望只获得 guestIds 并过滤 . 其次,将其减少到 10 行。
然后我会回去拿剩下的东西。
这是否获得了正确的 lookroomIds 列表?
SELECT lockroomId, GROUP_CONCAT(Story.ownerId) AS guestIds
FROM Story
WHERE storyType = 1
AND lockroomId > 0
AND createdAt < FROM_UNIXTIME(1598882406)
GROUP BY lockroomId
HAVING FIND_IN_SET(<userId>, guestIds)
ORDER BY createdAt DESC
LIMIT 10;
有了这个覆盖指数:
INDEX(storyType, lookroomId, createdAt, ownerId)
在我继续之前,请查看它的运行速度以及它是否获得了有效的 lockroomId 列表。
改进
你说这有帮助。请EXPLAIN SELECT
在索引更改后提供。
次要建议
你的桌子很大。请提供SHOW TABLE STATUS
、 RAM 大小和innodb_buffer_pool_size
. 如果可能,请检查 MySQL 是否正在交换。这将提供您是否受 I/O 限制的线索,从而导致焦点转移。
我看到很多BIGINTs
。 BIGINT
占用 8 个字节;INT
需要 4。如果涉及 I/O,缩小数据大小有助于提高性能。
你总是需要显示“评论”吗?也许这 6 列可以移动到单独的表中?
将“numlikes”移动到另一个表——这可能是有益的,特别是如果您在进行有问题的大选择时经常更新该列。
考虑较小的数据类型,例如minutes
. SMALLINT UNSIGNED
能坚持2个月吗?(只有 2 个字节。)
你真的使用created_at
andupdated_at
吗?(`TIMESTAMP(6) 每个占用 8 个字节。)
当前的索引集是什么?让我们看看第二个最差的查询。(我不想在改进当前查询的同时破坏它。)
更多的
innodb_buffer_pool_size = 1G
只有 2GB 的内存——听起来你可能正在交换。交换对性能非常不利。我建议将该设置降低到 500M。
60-80% 的建议适用于大型服务器。操作系统需要一些空间。MySQL 代码需要一些空间。其他数据结构占用一些空间。和网络服务器。以及您的应用程序代码。这些可能加起来超过 1GB。(我不确切知道。)这为缓冲池留下了不到 1GB 的空间。设置为500M,运行几天,如果有空闲RAM,可以增加buffer_pool一些。
与较小的 buffer_pool 相比,交换性能更差。
由于表大约 8GB,因此缩小表很重要。请参阅上面的建议。
由于combine_lockroomId_Index
是combine_story_index
DROP的第一部分combine_lockroomId_Index
。
推荐阅读
- button - 如何配置 Jetbrain phpstorm 将操作按钮添加到主菜单/工具栏,例如重做和撤消按钮,而不是 CTRL+Y 和 CTRL+SHIFT+Z?
- javascript - 如何使用 NPM 脚本打开终端选项卡?
- javascript - JSHINT 为 If 语句方程引发语法错误
- android - setDynamicLinkDomain 在代码中已被弃用,但在文档中似乎并未被弃用
- c# - 项目过期 - 不再显示此对话框 - 构建还是不构建?
- javascript - 如果没有 ios 中的特定按钮,则无法通过触摸关闭弹出窗口
- perforce - 在 P4 客户端上设置默认描述模板
- powershell - 如何在 Where-Object Powershell 中使用 If Else
- c# - 图像颜色在 Unity 中没有变化
- mediawiki - 模板打印空段落