mysql - 加快从庞大的日志表中收集查询的最佳方法是什么?
问题描述
我有 MySQL 数据库日志表,该表每天增加 5m 的数据,我在从该表中收集数据以进行一些分析计数时遇到问题。
我列出了问题的详细信息如下:
这是我的日志表:
CREATE TABLE `details` (
`id` BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
`provider` VARCHAR(25) NULL COLLATE 'utf8mb4_unicode_ci',
`DLR_Status` VARCHAR(30) NULL COLLATE 'utf8mb4_unicode_ci',
`source` VARCHAR(30) NULL COLLATE 'utf8mb4_bin',
`Destination` VARCHAR(30) NULL COLLATE 'utf8mb4_unicode_ci',
`msg` VARCHAR(1000) NULL COLLATE 'utf8mb4_unicode_ci',
`timestamp` TIMESTAMP NULL,
`msg_timestamp` INT NOT NULL,
`created_at` TIMESTAMP NULL,
`updated_at` TIMESTAMP NULL,
PRIMARY KEY (`id`, `msg_timestamp`) USING BTREE
)
COLLATE='utf8mb4_unicode_ci'
AUTO_INCREMENT=24169513
/*!50100 PARTITION BY RANGE (`msg_timestamp`)
(PARTITION p2016_02 VALUES LESS THAN (1456779600) ENGINE = InnoDB,
PARTITION p2016_03 VALUES LESS THAN (1459458000) ENGINE = InnoDB,
PARTITION p2016_04 VALUES LESS THAN (1462050000) ENGINE = InnoDB,
PARTITION p2016_05 VALUES LESS THAN (1464728400) ENGINE = InnoDB,
PARTITION p2016_06 VALUES LESS THAN (1467320400) ENGINE = InnoDB,
PARTITION p2016_07 VALUES LESS THAN (1469998800) ENGINE = InnoDB,
PARTITION p2016_08 VALUES LESS THAN (1472677200) ENGINE = InnoDB,
PARTITION p2016_09 VALUES LESS THAN (1475269200) ENGINE = InnoDB,
PARTITION p2016_10 VALUES LESS THAN (1477947600) ENGINE = InnoDB,
PARTITION p2016_11 VALUES LESS THAN (1480539600) ENGINE = InnoDB,
PARTITION p2016_12 VALUES LESS THAN (1483218000) ENGINE = InnoDB,
PARTITION p2017_01 VALUES LESS THAN (1485896400) ENGINE = InnoDB,
PARTITION p2017_02 VALUES LESS THAN (1488315600) ENGINE = InnoDB,
PARTITION p2017_03 VALUES LESS THAN (1490994000) ENGINE = InnoDB,
PARTITION p2017_04 VALUES LESS THAN (1493586000) ENGINE = InnoDB,
PARTITION p2017_05 VALUES LESS THAN (1496264400) ENGINE = InnoDB,
PARTITION p2017_06 VALUES LESS THAN (1498856400) ENGINE = InnoDB,
PARTITION p2017_07 VALUES LESS THAN (1501534800) ENGINE = InnoDB,
PARTITION p2017_08 VALUES LESS THAN (1504213200) ENGINE = InnoDB,
PARTITION p2017_09 VALUES LESS THAN (1506805200) ENGINE = InnoDB,
PARTITION p2017_10 VALUES LESS THAN (1509483600) ENGINE = InnoDB,
PARTITION p2017_11 VALUES LESS THAN (1512075600) ENGINE = InnoDB,
PARTITION p2017_12 VALUES LESS THAN (1514754000) ENGINE = InnoDB,
PARTITION p2018_01 VALUES LESS THAN (1517432400) ENGINE = InnoDB,
PARTITION p2018_02 VALUES LESS THAN (1519851600) ENGINE = InnoDB,
PARTITION p2018_03 VALUES LESS THAN (1522530000) ENGINE = InnoDB,
PARTITION p2018_04 VALUES LESS THAN (1525122000) ENGINE = InnoDB,
PARTITION p2018_05 VALUES LESS THAN (1527800400) ENGINE = InnoDB,
PARTITION p2018_06 VALUES LESS THAN (1530392400) ENGINE = InnoDB,
PARTITION p2018_07 VALUES LESS THAN (1533070800) ENGINE = InnoDB,
PARTITION p2018_08 VALUES LESS THAN (1535749200) ENGINE = InnoDB,
PARTITION p2018_09 VALUES LESS THAN (1538341200) ENGINE = InnoDB,
PARTITION p2018_10 VALUES LESS THAN (1541019600) ENGINE = InnoDB,
PARTITION p2018_11 VALUES LESS THAN (1543611600) ENGINE = InnoDB,
PARTITION p2018_12 VALUES LESS THAN (1546290000) ENGINE = InnoDB,
PARTITION p2019_01 VALUES LESS THAN (1548968400) ENGINE = InnoDB,
PARTITION p2019_02 VALUES LESS THAN (1551387600) ENGINE = InnoDB,
PARTITION p2019_03 VALUES LESS THAN (1554066000) ENGINE = InnoDB,
PARTITION p2019_04 VALUES LESS THAN (1556658000) ENGINE = InnoDB,
PARTITION p2019_05 VALUES LESS THAN (1559336400) ENGINE = InnoDB,
PARTITION p2019_06 VALUES LESS THAN (1561928400) ENGINE = InnoDB,
PARTITION p2019_07 VALUES LESS THAN (1564606800) ENGINE = InnoDB,
PARTITION p2019_08 VALUES LESS THAN (1567285200) ENGINE = InnoDB,
PARTITION p2019_09 VALUES LESS THAN (1569877200) ENGINE = InnoDB,
PARTITION p2019_10 VALUES LESS THAN (1572555600) ENGINE = InnoDB,
PARTITION p2019_11 VALUES LESS THAN (1575147600) ENGINE = InnoDB,
PARTITION p2019_12 VALUES LESS THAN (1577826000) ENGINE = InnoDB,
PARTITION p2020_01 VALUES LESS THAN (1580504400) ENGINE = InnoDB,
PARTITION p2020_02 VALUES LESS THAN (1583010000) ENGINE = InnoDB,
PARTITION p2020_03 VALUES LESS THAN (1585688400) ENGINE = InnoDB,
PARTITION p2020_04 VALUES LESS THAN (1588280400) ENGINE = InnoDB,
PARTITION p2020_05 VALUES LESS THAN (1590958800) ENGINE = InnoDB,
PARTITION p2020_06 VALUES LESS THAN (1593550800) ENGINE = InnoDB,
PARTITION p2020_07 VALUES LESS THAN (1596229200) ENGINE = InnoDB,
PARTITION p2020_08 VALUES LESS THAN (1598907600) ENGINE = InnoDB,
PARTITION p2020_09 VALUES LESS THAN (1601499600) ENGINE = InnoDB,
PARTITION p2020_10 VALUES LESS THAN (1604178000) ENGINE = InnoDB,
PARTITION p2020_11 VALUES LESS THAN (1606770000) ENGINE = InnoDB,
PARTITION p2020_12 VALUES LESS THAN (1609448400) ENGINE = InnoDB,
PARTITION p2021_01 VALUES LESS THAN (1612126800) ENGINE = InnoDB,
PARTITION p2021_02 VALUES LESS THAN (1614546000) ENGINE = InnoDB,
PARTITION p2021_03 VALUES LESS THAN (1617224400) ENGINE = InnoDB,
PARTITION p2021_04 VALUES LESS THAN (1619816400) ENGINE = InnoDB,
PARTITION p2021_05 VALUES LESS THAN (1622494800) ENGINE = InnoDB,
PARTITION p2021_06 VALUES LESS THAN (1625086800) ENGINE = InnoDB,
PARTITION p2021_07 VALUES LESS THAN (1627765200) ENGINE = InnoDB,
PARTITION p2021_08 VALUES LESS THAN (1630443600) ENGINE = InnoDB,
PARTITION p2021_09 VALUES LESS THAN (1633035600) ENGINE = InnoDB,
PARTITION p2021_10 VALUES LESS THAN (1635714000) ENGINE = InnoDB,
PARTITION p2021_11 VALUES LESS THAN (1638306000) ENGINE = InnoDB,
PARTITION p2021_12 VALUES LESS THAN (1640984400) ENGINE = InnoDB,
PARTITION p2022_01 VALUES LESS THAN (1643662800) ENGINE = InnoDB,
PARTITION p2022_02 VALUES LESS THAN (1646082000) ENGINE = InnoDB,
PARTITION p2022_03 VALUES LESS THAN (1648760400) ENGINE = InnoDB,
PARTITION p2022_04 VALUES LESS THAN (1651352400) ENGINE = InnoDB,
PARTITION p2022_05 VALUES LESS THAN (1654030800) ENGINE = InnoDB,
PARTITION p2022_06 VALUES LESS THAN (1656622800) ENGINE = InnoDB,
PARTITION p2022_07 VALUES LESS THAN (1659301200) ENGINE = InnoDB,
PARTITION p2022_08 VALUES LESS THAN (1661979600) ENGINE = InnoDB,
PARTITION p2022_09 VALUES LESS THAN (1664571600) ENGINE = InnoDB,
PARTITION p2022_10 VALUES LESS THAN (1667250000) ENGINE = InnoDB,
PARTITION p2022_11 VALUES LESS THAN (1669842000) ENGINE = InnoDB,
PARTITION p2022_12 VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */;
它包含如下日志数据:
ID | 提供者 | DLR_状态 | 资源 | 目的地 | 味精 | 时间戳 | msg_timestamp |
---|---|---|---|---|---|---|---|
1 | KDD | 完毕 | 网站 | 01332456 | 免费送货 | 2019-12-01 12:00:13 | 1575201613 |
2 | KDD | 完毕 | 用电话 | 01322422 | 花费300 | 2019-12-01 12:00:37 | 1575201637 |
. | . | . | . | . | . | . | . |
. | . | . | . | . | . | . | . |
问题是当我从这个表中选择一些计数时
SELECT SQL_CALC_FOUND_ROWS DLR_status,count(*) as c
FROM sms_details
group by DLR_status;
给出结果需要很长时间,并且某些查询会给出 504 Gateway Time-out 错误,例如此查询
SELECT SQL_CALC_FOUND_ROWS Destination,count(*) as c
FROM sms_details
WHERE msg_timestamp >= UNIX_TIMESTAMP("2019-10-01") and msg_timestamp < UNIX_TIMESTAMP("2019-12-01") group by Destination;
我已经在我的表中使用了分区,并且我尝试为某些列建立索引,但这对每天增加的数据造成了很大的问题。
那么以下最佳实践是什么:
- 加快执行时间
- 关心插入速度
解决方案
收缩模式-- 更小 --> 更少的 I/O --> 更快。
timestamp vs msg_timestamp -- 这些似乎是相同的,只是格式不同。因此,折腾其中之一。
规范化通过减少数据量来加快插入速度。大多数VARCHARs
可以用 2 字节 SMALLINT UNSIGNED 或 3 字节 MEDIUMINT UNSIGNED 替换。
未来的分区——这样的分区不要超过一个;会SELECTs
浪费时间打开它们却一无所获。
分区太多——在某个限制(可能是 50 个)下,拥有大量分区会减慢速度。
批处理是最好的加速。见LOAD DATA
或INSERT ... VALUES (...), (...), ...
。在后一种情况下,我推荐 1000 行的批次。(超出此范围会导致收益递减,并且可能存在一些限制。)如果数据来自多个来源,请解释;然后我们可以进一步讨论。
分区对于清除“旧”数据非常有用,因为DROP PARTITION
它比DELETE
. 见http://mysql.rjweb.org/doc.php/partitionmaint
折腾 created_at
和updated_at
; 它们可能没用。(同样,越小越快。)
没有时不需要SQL_CALC_FOUND_ROWSLIMIT
;只需观察返回了多少行。重新考虑用户的需求。(如果需要,请回来进行更多讨论。)
如果您有DLR_status 计数将是一个完整的索引扫描INDEX(DLR_status)
。并考虑将该列设为 ENUM,使其只有 1 个字节。(如果有多个值和/或越来越多的值,则“标准化”。)
查询 2需要INDEX(Destination, msg_timestmap)
。
大么?24M 行/5 年 --> 每秒少于 1 行。100 行/秒是我开始担心“高速摄取”的地方。也就是说,我认为插入没有问题。另一方面,选择可能是个问题。你给我们看了两个;让我们看看更多。我不想一次推荐一个索引;我宁愿设计一组索引来优化处理所有可能的查询。 特别是因为它可能涉及重新设计分区。
汇总表是在“数据仓库”中进行快速分析的绝佳方式。见http://mysql.rjweb.org/doc.php/summarytables
推荐阅读
- terraform - 如何从 terraform 0.11 自动升级到 terraform 0.14?
- javascript - Shopify 主题上的无限加载按钮
- angular - 即使我放置了 ngModel,Ng-Select 也没有分配默认值
- c++ - 在visual studio中使用native cpp testframework测试c代码LNK2019
- ansible - Ansible lineinfile 追加
- android - Delphi 10.4.1 如何在JMediaSession.setCallback 中使用Android JMediaSession_Callback
- paypal - PayPal 窗口未关闭且未调用 onApprove
- javascript - 如何通过 RxJs 合并或分组 toPromise?
- string - 如何处理 CHARACTER 变量,超过允许的最大值 2000?
- python - 如何使用 `type()` 内置函数传递类关键字参数?