首页 > 解决方案 > 在mysql中对巨大的聊天消息表进行分区

问题描述

我在 mysql 中有一个表,其中大约 7000 亿行用于表示聊天消息。文本存储在单独的表中。

+------------+------------------+------+-----+---------+----------------+
| Field      | Type             | Null | Key | Default | Extra          |
+------------+------------------+------+-----+---------+----------------+
| ID         | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| Type       | tinyint(4)       | NO   |     | 0       |                |
| FromUserID | int(10) unsigned | NO   | PRI | 0       |                |
| ToUserID   | int(10) unsigned | YES  | MUL | NULL    |                |
| TextID     | int(10) unsigned | YES  | MUL | NULL    |                |
| Ts         | datetime         | YES  |     | NULL    |                |
| IsNew      | tinyint(4)       | NO   |     | 0       |                |
| Direction  | tinyint(4)       | NO   |     | 0       |                |
| NeedStar   | tinyint(4)       | NO   |     | 0       |                |
| NeedSend   | tinyint(4)       | NO   |     | 0       |                |
| DirectID   | int(10) unsigned | YES  | MUL | NULL    |                |
| IdeaID     | int(11)          | YES  |     | NULL    |                |
| FilePos    | int(11)          | NO   |     | 0       |                |
+------------+------------------+------+-----+---------+----------------+

到目前为止,它已被分区BY HASH(FromUserID) PARTITIONS 16;和索引如下:

--------------------+--------------+-------------+-----------+-------------+
 Key_name           | Seq_in_index | Column_name | Collation | Cardinality |
--------------------+--------------+-------------+-----------+-------------+
 PRIMARY            |            1 | ID          | A         |   644937873 |
 PRIMARY            |            2 | FromUserID  | A         |   644937873 |
 DoulikeMessage_k_2 |            1 | FromUserID  | A         |     5971646 |
 DoulikeMessage_k_2 |            2 | ToUserID    | A         |   644937873 |
 DoulikeMessage_k_3 |            1 | DirectID    | A         |          28 |
 ToUserID           |            1 | ToUserID    | A         |    37937521 |
 TextID             |            1 | TextID      | A         |   644937873 |
--------------------+--------------+-------------+-----------+-------------+

我正在考虑创建一个具有不同分区和索引的类似表,然后在那里传输记录。对表最频繁的查询是

SELECT  DoulikeMessages.* FROM DoulikeMessages WHERE  1  AND (DoulikeMessages.FROMUSERID = '2048254')  AND DoulikeMessages.TOUSERID >= '1'  AND DoulikeMessages.TYPE <= '1'  ORDER BY Ts DESC;

有时需要长达 20 秒的时间来处理。那是因为 Ts (datetime) 上没有索引。我考虑做类似的事情PARTITION BY RANGE( FromUserID ) PARTITIONS 50。并按 Ts 索引。或者也许我应该按日期制作子分区?我可以在查询中添加诸如“Where TS > now()- 1 Month”之类的内容,因为大多数时候只需要最近的消息。而旧的只是稍后在代码中被丢弃。

另外我应该提到有人尝试手动对表进行分区(根据 fromuserid 创建 50 个新表并更改 api 以使用它们),但这对我来说似乎是个坏主意。我不明白如何从这种方法中受益。

标签: mysqldatabase-partitioning

解决方案


Partitioning is not a tool to make your queries faster. That is what indexes are for.

Access to your data currently is inefficient. While you have an index on FromUserID, to get to the actual data (you want *), MySQL has to look it up in the table. But there, the data is (randomly) ordered by the autoincrement id.

So MySQL first reads a row of your index, gets informed where in the table the full data is stored, jumps to that position in the table, reads the data there, and repeats with the next row in the index.

What's more, data in the table is stored in blocks, typically of 16kb. So to read a row from the table, which has 41 Byte, MySQL has to actually read 16kb. Maybe from disk. And these 41 useful bytes will also use 16kb of the buffer pool (the MySQL cache), and may force you to reread data from disk instead of memory much sooner than needed, as old data is replaced sooner. Or in other words: to cache your whole table in memory, you need 28gb of ram (assuming you meant 700 million rows when you wrote 700 billion rows). If only 1% of your users are currently active, in the worst case you still may need 28gb of ram, because more than 99% of the cached data might be unusuable, as you only need 41 bytes out of 16kb. There is some overlap (e.g. most blocks will contain several rows that you need), but you get the idea. If blocks would contain mostly data for the same user, you may only need 0,28gb to cache the data for the 1% active users, and have the rest to cache other data (or enough to keep it all in memory).

Just to emphasize: reading from disk is a really, really expensive operation. Just run your query for a random user (that isn't in memory yet) twice. The first time it is read from disk. The second time it is read from memory. Depending on your MySQL version, you may need to add SQL_NO_CACHE to get a accurate result, e.g. to get the actual effect of the buffer pool, not just the effect of the query cache.

To optimize your access, you should have all the data you need at hand when you jump to FromUserID. You can use a covering index, which means you create an index with all the columns you need. If you need all columns to cover *, this could mean an index (FromUserID, Ts, ToUserID, Type, ..., FilePos). It will take about as much disk space as your current table data, but it will allow MySQL to read a row from that index, realize it has all data, then read the next row from the index. Without having to consult the table for the complete data. This is much faster.

Add the effect that MySQL reads 16kb in one go, so if the required data is not cached and has to be read from disk, you already read the next 16kb/41byte=400 rows into memory, so up to 399 less disk reads. If this is an effect you see or not will depend on how much cache memory you currently have. But 20s for an extremely simple query indicates that you don't have your whole table in memory.

You could also consider to change your primary key to something starting with FromUserID, maybe FromUserID, Ts, ToUserID. While it saves disk space for the covering index, the decision to do this is more a logical consideration though, e.g. if this (or some other combination of your columns) actually logically is a primary key (e.g. unique and not null), which depends on your data model. You should not do it just to save disk space. If you do it, this would, like the covering index, also allow you to read all data from a user in one go, without having to consult a different data source.

Depending on how many messages these queries return, disk speed (ssd or hdd), amount of ram/buffer pool and server load, the execution time should be measured in ms instead of seconds.


推荐阅读