首页 > 解决方案 > SELECT 语句优化 MySQL

问题描述

我正在寻找一种方法如何使我的 SELECT 查询比现在更快,因为我觉得应该可以让它更快。

这是查询

SELECT r.id_customer, ROUND(AVG(tp.percentile_weighted), 2) AS percentile
FROM tag_rating AS r USE INDEX (value_date_add)
JOIN tag_product AS tp ON (tp.id_pair = r.id_pair)
WHERE 
r.value = 1 AND
r.date_add > '2020-08-08 11:56:00'
GROUP BY r.id_customer

这是解释选择

+----+-------------+-------+--------+----------------+----------------+---------+---------------+--------+---------------------------------------------------------------------+
| id | select_type | table | type   | possible_keys  | key            | key_len | ref           | rows   | Extra                                                               |
+----+-------------+-------+--------+----------------+----------------+---------+---------------+--------+---------------------------------------------------------------------+
| 1  | SIMPLE      | r     | ref    | value_date_add | value_date_add | 1       | const         | 449502 | Using index condition; Using where; Using temporary; Using filesort |
+----+-------------+-------+--------+----------------+----------------+---------+---------------+--------+---------------------------------------------------------------------+
| 1  | SIMPLE      | tp    | eq_ref | PRIMARY        | PRIMARY        | 4       | dev.r.id_pair | 1      |                                                                     |
+----+-------------+-------+--------+----------------+----------------+---------+---------------+--------+---------------------------------------------------------------------+

现在桌子是

CREATE TABLE `tag_product` (
  `id_pair` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `id_product` int(10) unsigned NOT NULL,
  `id_user_tag` int(10) unsigned NOT NULL,
  `status` tinyint(3) NOT NULL,
  `percentile` decimal(8,4) unsigned NOT NULL,
  `percentile_weighted` decimal(8,4) unsigned NOT NULL,
  `elo` int(10) unsigned NOT NULL,
  `date_add` datetime NOT NULL,
  `date_upd` datetime NOT NULL,
  PRIMARY KEY (`id_pair`),
  UNIQUE KEY `id_product_id_user_tag` (`id_product`,`id_user_tag`),
  KEY `status` (`status`),
  KEY `id_user_tag` (`id_user_tag`),
  CONSTRAINT `tag_product_ibfk_5` FOREIGN KEY (`id_user_tag`) REFERENCES `user_tag` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
CREATE TABLE `tag_rating` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `id_customer` int(10) unsigned NOT NULL,
  `id_pair` int(10) unsigned NOT NULL,
  `id_duel` int(10) unsigned NOT NULL,
  `value` tinyint(4) NOT NULL,
  `date_add` datetime NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `id_duel_id_pair` (`id_duel`,`id_pair`),
  KEY `id_pair_id_customer` (`id_pair`,`id_customer`),
  KEY `value` (`value`),
  KEY `value_date_add` (`value`,`date_add`),
  KEY `id_customer_value_date_add` (`id_customer`,`value`,`date_add`),
  CONSTRAINT `tag_rating_ibfk_3` FOREIGN KEY (`id_pair`) REFERENCES `tag_product` (`id_pair`) ON DELETE CASCADE ON UPDATE CASCADE,
  CONSTRAINT `tag_rating_ibfk_6` FOREIGN KEY (`id_duel`) REFERENCES `tag_rating_duel` (`id_duel`) ON DELETE CASCADE ON UPDATE CASCADE,
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

表 tag_product 有大约 250k 行,而 tag_rating 有大约 1M 行。

我的问题是 SQL 查询在我的机器上平均需要大约 0.8 秒。我希望将其理想地控制在 0.5 秒以下,同时还假设表格可以变大 10 倍。由于我有一个日期条件(我只想要不到一个月大的行),所以使用的行数应该大致相同。

这是否可以通过一些技巧(也就是不重组我的表格)来加快速度?当我稍微修改(不加入较小的表)该语句为

SELECT r.id_customer, COUNT(*)
FROM tag_rating AS r USE INDEX (value_date_add)
WHERE 
r.value = 1 AND
r.date_add > '2020-08-08 11:56:00'
GROUP BY r.id_customer;

这是解释选择

+----+-------------+-------+------+----------------+----------------+---------+-------+--------+---------------------------------------------------------------------+
| id | select_type | table | type | possible_keys  | key            | key_len | ref   | rows   | Extra                                                               |
+----+-------------+-------+------+----------------+----------------+---------+-------+--------+---------------------------------------------------------------------+
| 1  | SIMPLE      | r     | ref  | value_date_add | value_date_add | 1       | const | 449502 | Using index condition; Using where; Using temporary; Using filesort |
+----+-------------+-------+------+----------------+----------------+---------+-------+--------+---------------------------------------------------------------------+

大约需要 0.25 秒,这很棒。所以 JOIN 使它慢了 3 倍。这是不可避免的吗?我觉得因为我是通过主键加入的,所以它不应该让查询慢 3 倍。

- -更新 - -

这实际上是我的查询。不同的 id_customer 值的数量约为 1000 并且预计会增加,value=1 的行数正好是一半。到目前为止,查询性能似乎根据评级表中的行数线性减慢

在 id_customer_value_date_add 或 value_id_customer_date_add 索引末尾添加 id_pair 没有帮助。

SELECT r.id_customer, ROUND(AVG(tp.percentile_weighted), 2) AS percentile
FROM tag_rating AS r USE INDEX (id_customer_value_date_add)
JOIN tag_product AS tp ON (tp.id_pair = r.id_pair)
WHERE 
r.value = 1 AND
r.id_customer IN (2593179,1461878,2318871,2654090,2840415,2852531,2987432,3473275,3960453,3961798,4129734,4191571,4202912,4204817,4211263,4248789,765650,1341317,1430380,2116196,3367674,3701901,3995273,4118307,4136114,4236589,783262,913493,1034296,2626574,3574634,3785772,2825128,4157953,3331279,4180367,4208685,4287879,1038898,1445750,1975108,3658055,4185296,4276189,428693,4248631,1892448,3773855,2901524,3830868,3934786) AND
r.date_add > '2020-08-08 11:56:00'
GROUP BY r.id_customer

这是解释选择

+----+-------------+-------+--------+----------------------------+----------------------------+---------+----------------------------------+--------+--------------------------+
| id | select_type | table | type   | possible_keys              | key                        | key_len | ref                              | rows   | Extra                    |
+----+-------------+-------+--------+----------------------------+----------------------------+---------+----------------------------------+--------+--------------------------+
| 1  | SIMPLE      | r     | range  | id_customer_value_date_add | id_customer_value_date_add | 10      |                                  | 558906 | Using where; Using index |
+----+-------------+-------+--------+----------------------------+----------------------------+---------+----------------------------------+--------+--------------------------+
| 1  | SIMPLE      | tp    | eq_ref | PRIMARY,status             | PRIMARY                    | 4       | dev.r.id_pair | 1      | Using where              |
+----+-------------+-------+--------+----------------------------+----------------------------+---------+----------------------------------+--------+--------------------------+

任何提示表示赞赏。谢谢

标签: mysqlsqloptimizationquery-optimization

解决方案


INDEX(value, date_add, id_customer, id_pair)

将是“覆盖”,从而为这两个查询提供额外的性能提升。也适用于戈登的配方。

同时,摆脱这些:

KEY `value` (`value`),
KEY `value_date_add` (`value`,`date_add`),

因为它们可能会妨碍优化器选择新索引。使用这些索引的任何其他查询都将轻松使用新索引。

如果您没有以其他方式使用tag_rating.id,请将其删除并提升UNIQUEPRIMARY KEY.


推荐阅读