首页 > 解决方案 > 加快从庞大的日志表中收集查询的最佳方法是什么?

问题描述

我有 MySQL 数据库日志表,该表每天增加 5m 的数据,我在从该表中收集数据以进行一些分析计数时遇到问题。

我列出了问题的详细信息如下:

这是我的日志表:

CREATE TABLE `details` (
    `id` BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
    `provider` VARCHAR(25) NULL COLLATE 'utf8mb4_unicode_ci',
    `DLR_Status` VARCHAR(30) NULL COLLATE 'utf8mb4_unicode_ci',
    `source` VARCHAR(30) NULL COLLATE 'utf8mb4_bin',
    `Destination` VARCHAR(30) NULL COLLATE 'utf8mb4_unicode_ci',
    `msg` VARCHAR(1000) NULL COLLATE 'utf8mb4_unicode_ci',
    `timestamp` TIMESTAMP NULL,
    `msg_timestamp` INT NOT NULL,
    `created_at` TIMESTAMP NULL,
    `updated_at` TIMESTAMP NULL,
    PRIMARY KEY (`id`, `msg_timestamp`) USING BTREE
)
COLLATE='utf8mb4_unicode_ci'
AUTO_INCREMENT=24169513
/*!50100 PARTITION BY RANGE (`msg_timestamp`)
(PARTITION p2016_02 VALUES LESS THAN (1456779600) ENGINE = InnoDB,
 PARTITION p2016_03 VALUES LESS THAN (1459458000) ENGINE = InnoDB,
 PARTITION p2016_04 VALUES LESS THAN (1462050000) ENGINE = InnoDB,
 PARTITION p2016_05 VALUES LESS THAN (1464728400) ENGINE = InnoDB,
 PARTITION p2016_06 VALUES LESS THAN (1467320400) ENGINE = InnoDB,
 PARTITION p2016_07 VALUES LESS THAN (1469998800) ENGINE = InnoDB,
 PARTITION p2016_08 VALUES LESS THAN (1472677200) ENGINE = InnoDB,
 PARTITION p2016_09 VALUES LESS THAN (1475269200) ENGINE = InnoDB,
 PARTITION p2016_10 VALUES LESS THAN (1477947600) ENGINE = InnoDB,
 PARTITION p2016_11 VALUES LESS THAN (1480539600) ENGINE = InnoDB,
 PARTITION p2016_12 VALUES LESS THAN (1483218000) ENGINE = InnoDB,
 PARTITION p2017_01 VALUES LESS THAN (1485896400) ENGINE = InnoDB,
 PARTITION p2017_02 VALUES LESS THAN (1488315600) ENGINE = InnoDB,
 PARTITION p2017_03 VALUES LESS THAN (1490994000) ENGINE = InnoDB,
 PARTITION p2017_04 VALUES LESS THAN (1493586000) ENGINE = InnoDB,
 PARTITION p2017_05 VALUES LESS THAN (1496264400) ENGINE = InnoDB,
 PARTITION p2017_06 VALUES LESS THAN (1498856400) ENGINE = InnoDB,
 PARTITION p2017_07 VALUES LESS THAN (1501534800) ENGINE = InnoDB,
 PARTITION p2017_08 VALUES LESS THAN (1504213200) ENGINE = InnoDB,
 PARTITION p2017_09 VALUES LESS THAN (1506805200) ENGINE = InnoDB,
 PARTITION p2017_10 VALUES LESS THAN (1509483600) ENGINE = InnoDB,
 PARTITION p2017_11 VALUES LESS THAN (1512075600) ENGINE = InnoDB,
 PARTITION p2017_12 VALUES LESS THAN (1514754000) ENGINE = InnoDB,
 PARTITION p2018_01 VALUES LESS THAN (1517432400) ENGINE = InnoDB,
 PARTITION p2018_02 VALUES LESS THAN (1519851600) ENGINE = InnoDB,
 PARTITION p2018_03 VALUES LESS THAN (1522530000) ENGINE = InnoDB,
 PARTITION p2018_04 VALUES LESS THAN (1525122000) ENGINE = InnoDB,
 PARTITION p2018_05 VALUES LESS THAN (1527800400) ENGINE = InnoDB,
 PARTITION p2018_06 VALUES LESS THAN (1530392400) ENGINE = InnoDB,
 PARTITION p2018_07 VALUES LESS THAN (1533070800) ENGINE = InnoDB,
 PARTITION p2018_08 VALUES LESS THAN (1535749200) ENGINE = InnoDB,
 PARTITION p2018_09 VALUES LESS THAN (1538341200) ENGINE = InnoDB,
 PARTITION p2018_10 VALUES LESS THAN (1541019600) ENGINE = InnoDB,
 PARTITION p2018_11 VALUES LESS THAN (1543611600) ENGINE = InnoDB,
 PARTITION p2018_12 VALUES LESS THAN (1546290000) ENGINE = InnoDB,
 PARTITION p2019_01 VALUES LESS THAN (1548968400) ENGINE = InnoDB,
 PARTITION p2019_02 VALUES LESS THAN (1551387600) ENGINE = InnoDB,
 PARTITION p2019_03 VALUES LESS THAN (1554066000) ENGINE = InnoDB,
 PARTITION p2019_04 VALUES LESS THAN (1556658000) ENGINE = InnoDB,
 PARTITION p2019_05 VALUES LESS THAN (1559336400) ENGINE = InnoDB,
 PARTITION p2019_06 VALUES LESS THAN (1561928400) ENGINE = InnoDB,
 PARTITION p2019_07 VALUES LESS THAN (1564606800) ENGINE = InnoDB,
 PARTITION p2019_08 VALUES LESS THAN (1567285200) ENGINE = InnoDB,
 PARTITION p2019_09 VALUES LESS THAN (1569877200) ENGINE = InnoDB,
 PARTITION p2019_10 VALUES LESS THAN (1572555600) ENGINE = InnoDB,
 PARTITION p2019_11 VALUES LESS THAN (1575147600) ENGINE = InnoDB,
 PARTITION p2019_12 VALUES LESS THAN (1577826000) ENGINE = InnoDB,
 PARTITION p2020_01 VALUES LESS THAN (1580504400) ENGINE = InnoDB,
 PARTITION p2020_02 VALUES LESS THAN (1583010000) ENGINE = InnoDB,
 PARTITION p2020_03 VALUES LESS THAN (1585688400) ENGINE = InnoDB,
 PARTITION p2020_04 VALUES LESS THAN (1588280400) ENGINE = InnoDB,
 PARTITION p2020_05 VALUES LESS THAN (1590958800) ENGINE = InnoDB,
 PARTITION p2020_06 VALUES LESS THAN (1593550800) ENGINE = InnoDB,
 PARTITION p2020_07 VALUES LESS THAN (1596229200) ENGINE = InnoDB,
 PARTITION p2020_08 VALUES LESS THAN (1598907600) ENGINE = InnoDB,
 PARTITION p2020_09 VALUES LESS THAN (1601499600) ENGINE = InnoDB,
 PARTITION p2020_10 VALUES LESS THAN (1604178000) ENGINE = InnoDB,
 PARTITION p2020_11 VALUES LESS THAN (1606770000) ENGINE = InnoDB,
 PARTITION p2020_12 VALUES LESS THAN (1609448400) ENGINE = InnoDB,
 PARTITION p2021_01 VALUES LESS THAN (1612126800) ENGINE = InnoDB,
 PARTITION p2021_02 VALUES LESS THAN (1614546000) ENGINE = InnoDB,
 PARTITION p2021_03 VALUES LESS THAN (1617224400) ENGINE = InnoDB,
 PARTITION p2021_04 VALUES LESS THAN (1619816400) ENGINE = InnoDB,
 PARTITION p2021_05 VALUES LESS THAN (1622494800) ENGINE = InnoDB,
 PARTITION p2021_06 VALUES LESS THAN (1625086800) ENGINE = InnoDB,
 PARTITION p2021_07 VALUES LESS THAN (1627765200) ENGINE = InnoDB,
 PARTITION p2021_08 VALUES LESS THAN (1630443600) ENGINE = InnoDB,
 PARTITION p2021_09 VALUES LESS THAN (1633035600) ENGINE = InnoDB,
 PARTITION p2021_10 VALUES LESS THAN (1635714000) ENGINE = InnoDB,
 PARTITION p2021_11 VALUES LESS THAN (1638306000) ENGINE = InnoDB,
 PARTITION p2021_12 VALUES LESS THAN (1640984400) ENGINE = InnoDB,
 PARTITION p2022_01 VALUES LESS THAN (1643662800) ENGINE = InnoDB,
 PARTITION p2022_02 VALUES LESS THAN (1646082000) ENGINE = InnoDB,
 PARTITION p2022_03 VALUES LESS THAN (1648760400) ENGINE = InnoDB,
 PARTITION p2022_04 VALUES LESS THAN (1651352400) ENGINE = InnoDB,
 PARTITION p2022_05 VALUES LESS THAN (1654030800) ENGINE = InnoDB,
 PARTITION p2022_06 VALUES LESS THAN (1656622800) ENGINE = InnoDB,
 PARTITION p2022_07 VALUES LESS THAN (1659301200) ENGINE = InnoDB,
 PARTITION p2022_08 VALUES LESS THAN (1661979600) ENGINE = InnoDB,
 PARTITION p2022_09 VALUES LESS THAN (1664571600) ENGINE = InnoDB,
 PARTITION p2022_10 VALUES LESS THAN (1667250000) ENGINE = InnoDB,
 PARTITION p2022_11 VALUES LESS THAN (1669842000) ENGINE = InnoDB,
 PARTITION p2022_12 VALUES LESS THAN MAXVALUE ENGINE = InnoDB)  */;

它包含如下日志数据:

ID 提供者 DLR_状态 资源 目的地 味精 时间戳 msg_timestamp
1 KDD 完毕 网站 01332456 免费送货 2019-12-01 12:00:13 1575201613
2 KDD 完毕 用电话 01322422 花费300 2019-12-01 12:00:37 1575201637
. . . . . . . .
. . . . . . . .

问题是当我从这个表中选择一些计数时

SELECT SQL_CALC_FOUND_ROWS DLR_status,count(*) as c 
FROM sms_details 
group by DLR_status;

给出结果需要很长时间,并且某些查询会给出 504 Gateway Time-out 错误,例如此查询

SELECT SQL_CALC_FOUND_ROWS Destination,count(*) as c 
FROM sms_details 
WHERE msg_timestamp >= UNIX_TIMESTAMP("2019-10-01") and msg_timestamp < UNIX_TIMESTAMP("2019-12-01") group by Destination;

我已经在我的表中使用了分区,并且我尝试为某些列建立索引,但这对每天增加的数据造成了很大的问题。

那么以下最佳实践是什么:

标签: mysqlquery-optimizationpartitioning

解决方案


收缩模式-- 更小 --> 更少的 I/O --> 更快。

timestamp vs msg_timestamp -- 这些似乎是相同的,只是格式不同。因此,折腾其中之一。

规范化通过减少数据量来加快插入速度。大多数VARCHARs可以用 2 字节 SMALLINT UNSIGNED 或 3 字节 MEDIUMINT UNSIGNED 替换。

未来的分区——这样的分区不要超过一个;会SELECTs浪费时间打开它们却一无所获。

分区太多——在某个限制(可能是 50 个)下,拥有大量分区会减慢速度。

批处理是最好的加速。见LOAD DATAINSERT ... VALUES (...), (...), ...。在后一种情况下,我推荐 1000 行的批次。(超出此范围会导致收益递减,并且可能存在一些限制。)如果数据来自多个来源,请解释;然后我们可以进一步讨论。

分区对于清除“旧”数据非常有用,因为DROP PARTITION它比DELETE. 见http://mysql.rjweb.org/doc.php/partitionmaint

折腾 created_atupdated_at; 它们可能没用。(同样,越小越快。)

没有时不需要SQL_CALC_FOUND_ROWSLIMIT;只需观察返回了多少行。重新考虑用户的需求。(如果需要,请回来进行更多讨论。)

如果您有DLR_status 计数将是一个完整的索引扫描INDEX(DLR_status)。并考虑将该列设为 ENUM,使其只有 1 个字节。(如果有多个值和/或越来越多的值,则“标准化”。)

查询 2需要INDEX(Destination, msg_timestmap)

大么?24M 行/5 年 --> 每秒少于 1 行。100 行/秒是我开始担心“高速摄取”的地方。也就是说,我认为插入没有问题。另一方面,选择可能是个问题。你给我们看了两个;让我们看看更多。我不想一次推荐一个索引;我宁愿设计一组索引来优化处理所有可能的查询。 特别是因为它可能涉及重新设计分区

汇总表是在“数据仓库”中进行快速分析的绝佳方式。见http://mysql.rjweb.org/doc.php/summarytables


推荐阅读