php - 为什么“and”这个词会使 MySQL 全文搜索如此缓慢?
问题描述
我编写了一个 CodeIgniter 类来搜索四个数据库表中的一些搜索字符串。这些搜索使用MySQL 自然语言全文搜索。实际执行搜索的 SQL 被用作预准备语句。这就是 SQL 的样子,命名参数:clean_string被要搜索的带引号的字符串替换,其他命名参数被整数替换,以根据它们的重要性对各个表进行加权:
SELECT c_i, c_t, seo_title, SUM(score) AS score
FROM (
SELECT c.id_code AS c_i, c.title AS c_t, c.seo_title, (MATCH(title) AGAINST (:clean_string IN NATURAL LANGUAGE MODE)) * :career_title_factor AS score, 'q_ct' AS qid
FROM careers c
WHERE MATCH(title) AGAINST (:clean_string IN NATURAL LANGUAGE MODE)
UNION
SELECT c.id_code AS c_i, c.title AS c_t, c.seo_title, AVG(MATCH(ts.task) AGAINST (:clean_string IN NATURAL LANGUAGE MODE)) * :career_task_statement_factor AS score, 'q_ts' AS qid
FROM tasks ts, careers c
WHERE ts.id_code = c.id_code
AND MATCH(ts.task) AGAINST (:clean_string IN NATURAL LANGUAGE MODE)
GROUP BY c_i
UNION
SELECT c.id_code AS c_i, c.title AS c_t, c.seo_title, AVG(MATCH(oat.alternate_title) AGAINST (:clean_string IN NATURAL LANGUAGE MODE)) * :career_alternate_title_factor AS score, 'q_at' AS qid
FROM other_titles oat, careers c
WHERE oat.id_code = c.id_code
AND MATCH(oat.alternate_title) AGAINST (:clean_string IN NATURAL LANGUAGE MODE)
GROUP BY c_i
UNION
SELECT c.id_code AS c_i, c.title AS c_t, c.seo_title, AVG(MATCH(od.description) AGAINST (:clean_string IN NATURAL LANGUAGE MODE)) * :career_occupation_data_factor AS score, 'q_od' AS qid
FROM other_data od, careers c
WHERE od.id_code = c.id_code
AND MATCH(od.description) AGAINST (:clean_string IN NATURAL LANGUAGE MODE)
GROUP BY c_i
) AS union_query
GROUP BY c_i
对于大多数搜索,此查询运行良好且快速——甚至是随机生成的无意义字符串。但是,当单词和出现在我的搜索字符串中时,它的运行速度非常慢。为什么会这样?
为了证明这一点,我关闭了 MySQL 查询缓存并运行以下测试代码:
public function test() {
$new_elapsed = 0;
$search_terms = array(
"video games",
"game video",
"software",
"nautical architect",
"CEO",
"pig and blanket machine",
"art gallery",
"rock and roll",
"guitar god magic metal",
"mott the hoople"
);
for($i=0; $i<10; $i++) {
$search = $search_terms[$i];
echo "search term is $search\n";
$start = microtime(true);
$results_new = MY_search::search_new($this->db, $search);
$new_elapsed = (microtime(true) - $start);
echo "new has " . sizeof($results_new) . " matches\n";
echo "new_elapsed: $new_elapsed\n";
echo "--\n";
}
}
正如您从此处的搜索结果中看到的那样,带有单词and的两个测试查询的运行速度要慢很多。好像慢了一百倍。
search term is video games
new has 76 matches
new_elapsed: 0.23601198196411
--
search term is game video
new has 73 matches
new_elapsed: 0.27281093597412
--
search term is software
new has 124 matches
new_elapsed: 0.41503596305847
--
search term is nautical architect
new has 24 matches
new_elapsed: 0.11621713638306
--
search term is CEO
new has 2 matches
new_elapsed: 0.012537002563477
--
search term is pig and blanket machine
new has 1078 matches
new_elapsed: 19.989203929901
--
search term is art gallery
new has 53 matches
new_elapsed: 0.19855809211731
--
search term is rock and roll
new has 1077 matches
new_elapsed: 16.845540046692
--
search term is guitar god magic metal
new has 140 matches
new_elapsed: 0.66281390190125
--
search term is mott the hoople
new has 0 matches
new_elapsed: 0.0023038387298584
--
值得注意的是,“the”包含在停用词中,并且innodb_ft_min_token_size设置为 3。
为什么会这样?我担心“and”可能会被全文搜索引擎解释为某种运算符。这些较长的搜索词对于我的应用程序是不可接受的。我很想a)在我的停用词中添加“and”或b)从我的代码中的任何搜索字符串中过滤这个词,但我担心这个慢查询问题可能是由其他词触发的。
编辑:根据 Raymond 的评论,我在这里发布了一些解释声明。我专注于内部查询之一,因为它本身非常慢。
EXPLAIN SELECT c.id_code AS c_i, c.title AS c_t, c.seo_title, AVG(MATCH(ts.task) AGAINST ('pig and whistle' IN NATURAL LANGUAGE MODE)) * 3 AS score, 'q_ts' AS qid
FROM tasks ts, careers c
WHERE ts.id_code = c.id_code
AND MATCH(ts.task) AGAINST ('pig and whistle' IN NATURAL LANGUAGE MODE)
GROUP BY c_i
这产生:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE ts fulltext id_code,task task 0 NULL 1 Using where; Using temporary; Using filesort
1 SIMPLE c ALL id_code NULL NULL NULL 1110 Using where
Raymond 建议该查询不符合 ANSI,因此我修改了 group by 子句以添加其他列并再次尝试。它仍然很慢:
EXPLAIN SELECT c.id_code AS c_i, c.title AS c_t, c.seo_title, AVG(MATCH(ts.task) AGAINST ('pig and whistle' IN NATURAL LANGUAGE MODE)) * 3 AS score, 'q_ts' AS qid
FROM tasks ts, careers c
WHERE ts.id_code = c.id_code
AND MATCH(ts.task) AGAINST ('pig and whistle' IN NATURAL LANGUAGE MODE)
GROUP BY c_i, c_t, seo_title
解释输出是相同的。
如果我解释整个查询(使用 UNION 语句等),我会得到更详细的结果:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 3332 Using temporary; Using filesort
2 DERIVED c fulltext title title 0 NULL 1 Using where
3 UNION ts fulltext id_code,task task 0 NULL 1 Using where; Using temporary; Using filesort
3 UNION c ALL id_code NULL NULL NULL 1110 Using where
4 UNION oat fulltext id_code,alternate_title alternate_title 0 NULL 1 Using where; Using temporary; Using filesort
4 UNION c ALL id_code NULL NULL NULL 1110 Using where
5 UNION od fulltext PRIMARY,description description 0 NULL 1 Using where; Using temporary; Using filesort
5 UNION c ALL id_code NULL NULL NULL 1110 Using where
NULL UNION RESULT <union2,3,4,5> ALL NULL NULL NULL NULL NULL Using temporary
编辑 2:Raymond 还要求提供表格定义。他们来了:
CREATE TABLE `careers` (
`id` int(6) unsigned NOT NULL AUTO_INCREMENT,
`id_code` varchar(12) NOT NULL DEFAULT '',
`title` varchar(250) NOT NULL DEFAULT '',
`title_singular` varchar(250) NOT NULL DEFAULT '',
`seo_title` varchar(150) NOT NULL,
`cat_id` varchar(4) NOT NULL DEFAULT '',
`occ_code` varchar(10) DEFAULT NULL,
`forum_id` int(6) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
UNIQUE KEY `id_code` (`id_code`),
KEY `cat_id` (`cat_id`),
KEY `forum_id` (`forum_id`),
FULLTEXT KEY `title` (`title`)
) ENGINE=MyISAM AUTO_INCREMENT=1111 DEFAULT CHARSET=latin1
CREATE TABLE `tasks` (
`id_code` char(10) NOT NULL,
`task_id` decimal(8,0) NOT NULL,
`task` varchar(1000) NOT NULL,
`task_type` varchar(12) DEFAULT NULL,
`incumbents_responding` decimal(4,0) DEFAULT NULL,
`date_updated` date NOT NULL,
`domain_source` varchar(30) NOT NULL,
PRIMARY KEY (`task_id`),
KEY `id_code` (`id_code`),
FULLTEXT KEY `task` (`task`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
CREATE TABLE `other_titles` (
`id_code` char(10) NOT NULL,
`alternate_title` varchar(150) NOT NULL,
`short_title` varchar(150) DEFAULT NULL,
`sources` varchar(50) NOT NULL,
KEY `id_code` (`id_code`),
FULLTEXT KEY `alternate_title` (`alternate_title`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
CREATE TABLE `other_data` (
`id_code` char(10) NOT NULL,
`title` varchar(150) NOT NULL,
`description` varchar(1000) NOT NULL,
PRIMARY KEY (`id_code`),
FULLTEXT KEY `description` (`description`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
解决方案
推荐阅读
- flutter - 红屏:无效参数 (onError):错误处理程序必须接受一个 Ojbect 或一个对象和一个 Stacktrace 作为参数。:闭包:() => Null
- json - Symfony 4 - 更新 JSON 用户角色
- r - Rcpp 找不到 nloptrAPI.h 标头
- java - 动态更改 JTabbedPane 活动选项卡颜色
- python-3.x - 如何从表、.txt 文件、Python 3 中提取特定数据
- flutter - 如何获取存储在 Firebase 中的视频的公共链接
- flutter - 如何让我的主题数据延续到 Flutter 中的新页面
- react-bootstrap - 克隆存储库后在 React-bootstrap 中的测试失败
- postgresql - 按周计算 2 个月内的条目数
- java - PreferenceFragment 在活动重新创建时消失