首页 > 解决方案 > 优化“转发”MySQL 查询

问题描述

我有一个帖子表、一个转发表和一个表示“用户关注”状态的表。

我想做一些类似 Twitter 的事情,我会在其中显示来自关注用户的所有帖子或转发。

我希望帖子在第一次出现时出现,这样如果多个用户重新发布帖子,它只会在第一次出现。

为了加快这个查询,每当创建一个帖子时,我都会插入到转发表中,这样也会创建一个相应的转发(来自作者)。

我的架构看起来像这样:

Table Post
id: INT
userId: INT
time: INT

Table Repost
id: INT
postId: INT
userId: INT
time: INT

Table users_following
userId: INT
followerId: INT

我的查询看起来像这样。

SELECT sr.* FROM Repost sr
INNER JOIN (
    SELECT MIN(ir.time) min_time, ir.postId FROM Repost ir
    WHERE ir.userId IN (
        SELECT uf.userId FROM users_following uf WHERE
        ir.userId = uf.userId AND uf.followerId = 1
    )
    OR ir.userId = 1
    GROUP BY ir.postId
) rr ON rr.postId = sr.postId AND sr.time = rr.min_time

这个想法是这样的:

  1. SELECT FROM users_following uf。选择查看器后跟的所有用户 ID。
  2. 从重新发布 ir 中选择。选择给定帖子的最短转发时间,其中转发者 ID 是关注的用户或查看者。
  3. 从转发中选择 使用内部连接为给定帖子选择具有最短时间的转发。

这行得通,但第 3 阶段很慢。我相信这是因为一旦我们有一个 min_times 的大列表,我们就不能使用任何索引从该子查询中进行选择,这意味着我们需要扫描所有内容。有没有办法构造这个查询以提高性能?

这是完整的EXPLAINSHOW CREATE TABLE适用于铁杆读者。

解释

+----+--------------------+------------+------------+--------+-------------------------------------------------------------+----------------------+---------+---------------------------------+--------+----------+--------------------------+
| id | select_type        | table      | partitions | type   | possible_keys                                               | key                  | key_len | ref                             | rows   | filtered | Extra                    |
+----+--------------------+------------+------------+--------+-------------------------------------------------------------+----------------------+---------+---------------------------------+--------+----------+--------------------------+
|  1 | PRIMARY            | <derived2> | NULL       | ALL    | NULL                                                        | NULL                 | NULL    | NULL                            | 797455 |   100.00 | Using where              |
|  1 | PRIMARY            | sr         | NULL       | ref    | IDX_DA9843F3E094D20D,repost_time_idx,repost_stream_idx      | repost_time_idx      | 4       | rr.min_time                     |      1 |     4.92 | Using where              |
|  2 | DERIVED            | ir         | NULL       | index  | IDX_DA9843F364B64DCC,IDX_DA9843F3E094D20D,repost_stream_idx | IDX_DA9843F3E094D20D | 4       | NULL                            | 797456 |   100.00 | Using where              |
|  3 | DEPENDENT SUBQUERY | uf         | NULL       | eq_ref | PRIMARY,IDX_17C2F70264B64DCC,IDX_17C2F702F542AA03           | PRIMARY              | 8       | prose_2_24_2021.ir.userId,const |      1 |   100.00 | Using where; Using index |
+----+--------------------+------------+------------+--------+-------------------------------------------------------------+----------------------+---------+---------------------------------+--------+----------+--------------------------+

显示创建表重新发布

CREATE TABLE `Repost` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `postId` int(11) NOT NULL,
  `userId` int(11) NOT NULL,
  `time` int(11) NOT NULL,
  `isRepost` int(11) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `IDX_DA9843F364B64DCC` (`userId`),
  KEY `IDX_DA9843F3E094D20D` (`postId`),
  KEY `repost_time_idx` (`time`),
  KEY `repost_stream_idx` (`time`,`userId`,`postId`),
  CONSTRAINT `FK_DA9843F364B64DCC` FOREIGN KEY (`userId`) REFERENCES `ProseUser` (`id`),
  CONSTRAINT `FK_DA9843F3E094D20D` FOREIGN KEY (`postId`) REFERENCES `Post` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=809018 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci 

显示创建表 users_following

CREATE TABLE `users_following` (
  `userId` int(11) NOT NULL,
  `followerId` int(11) NOT NULL,
  PRIMARY KEY (`userId`,`followerId`),
  KEY `IDX_17C2F70264B64DCC` (`userId`),
  KEY `IDX_17C2F702F542AA03` (`followerId`),
  CONSTRAINT `FK_17C2F70264B64DCC` FOREIGN KEY (`userId`) REFERENCES `ProseUser` (`id`),
  CONSTRAINT `FK_17C2F702F542AA03` FOREIGN KEY (`followerId`) REFERENCES `ProseUser` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci 

编辑

像这样调整查询会产生更快的结果,尽管添加ORDER BY会使其变慢。没有ORDER BY,这个查询很棒。

SELECT sr.* FROM Repost sr
INNER JOIN (
    SELECT MIN(ir.time) min_time, ir.postId FROM Repost ir
    INNER JOIN users_following uf ON ir.userId = uf.userId AND uf.followerId = 1
    GROUP BY ir.postId
) rr ON rr.postId = sr.postId AND sr.time = rr.min_time
ORDER BY sr.time desc
LIMIT 10

这是此查询的解释:

+----+-------------+------------+------------+--------+--------------------------------------------------------------------------------+----------------------+---------+---------------------------+------+----------+----------------------------------------------+
| id | select_type | table      | partitions | type   | possible_keys                                                                  | key                  | key_len | ref                       | rows | filtered | Extra                                        |
+----+-------------+------------+------------+--------+--------------------------------------------------------------------------------+----------------------+---------+---------------------------+------+----------+----------------------------------------------+
|  1 | PRIMARY     | <derived2> | NULL       | ALL    | NULL                                                                           | NULL                 | NULL    | NULL                      |  691 |   100.00 | Using where; Using temporary; Using filesort |
|  1 | PRIMARY     | sr         | NULL       | ref    | IDX_DA9843F3E094D20D,repost_time_idx,repost_stream_idx,repost_stream2_idx      | repost_stream2_idx   | 8       | rr.min_time,rr.postId     |    1 |   100.00 | NULL                                         |
|  2 | DERIVED     | uf         | NULL       | ref    | PRIMARY,IDX_17C2F70264B64DCC,IDX_17C2F702F542AA03                              | IDX_17C2F702F542AA03 | 4       | const                     |  145 |   100.00 | Using index; Using temporary; Using filesort |
|  2 | DERIVED     | ir         | NULL       | ref    | IDX_DA9843F364B64DCC,IDX_DA9843F3E094D20D,repost_stream_idx,repost_stream2_idx | IDX_DA9843F364B64DCC | 4       | prose_2_24_2021.uf.userId |    9 |   100.00 | NULL                                         |
|  2 | DERIVED     | rp         | NULL       | eq_ref | PRIMARY,post_spotlight_idx,post_time_idx,post_trending_idx                     | PRIMARY              | 4       | prose_2_24_2021.ir.postId |    1 |    50.00 | Using where                                  |
+----+-------------+------------+------------+--------+--------------------------------------------------------------------------------+----------------------+---------+---------------------------+------+----------+----------------------------------------------+

标签: mysqlquery-optimization

解决方案


重新发布需要对索引进行大修

  PRIMARY KEY (`id`),
  KEY `IDX_DA9843F364B64DCC` (`userId`),
  KEY `IDX_DA9843F3E094D20D` (`postId`),
  KEY `repost_time_idx` (`time`),
  KEY `repost_stream_idx` (`time`,`userId`,`postId`),

  PRIMARY KEY(postId, userId, time, id),   -- `id` is for uniqueness
  INDEX(id)  -- to keep AUTO_INCREMENT happy

(不知道其他的有没有用。)

更改IN ( SELECT ... )EXISTS ( SELECT 1 ... )

OR是性能杀手。使用 OR 的一侧对查询进行计时,然后使用另一侧计时。假设这些时间的总和比您当前的时间快,UNION那就是加在一起。如果可行,请简化每个查询。给我看结果;我可能有更多的索引建议。


推荐阅读