首页 > 解决方案 > 为什么“and”这个词会使 MySQL 全文搜索如此缓慢?

问题描述

我编写了一个 CodeIgniter 类来搜索四个数据库表中的一些搜索字符串。这些搜索使用MySQL 自然语言全文搜索。实际执行搜索的 SQL 被用作预准备语句。这就是 SQL 的样子,命名参数:clean_string被要搜索的带引号的字符串替换,其他命名参数被整数替换,以根据它们的重要性对各个表进行加权:

SELECT c_i, c_t, seo_title, SUM(score) AS score
FROM (

SELECT c.id_code AS c_i, c.title AS c_t, c.seo_title, (MATCH(title) AGAINST (:clean_string IN NATURAL LANGUAGE MODE)) * :career_title_factor AS score, 'q_ct' AS qid
FROM careers c
WHERE MATCH(title) AGAINST (:clean_string IN NATURAL LANGUAGE MODE)

UNION

SELECT c.id_code AS c_i, c.title AS c_t, c.seo_title, AVG(MATCH(ts.task) AGAINST (:clean_string IN NATURAL LANGUAGE MODE)) * :career_task_statement_factor AS score, 'q_ts' AS qid
FROM tasks ts, careers c
WHERE ts.id_code = c.id_code
    AND MATCH(ts.task) AGAINST (:clean_string IN NATURAL LANGUAGE MODE)
GROUP BY c_i

UNION

SELECT c.id_code AS c_i, c.title AS c_t, c.seo_title, AVG(MATCH(oat.alternate_title) AGAINST (:clean_string IN NATURAL LANGUAGE MODE)) * :career_alternate_title_factor AS score, 'q_at' AS qid
FROM other_titles oat, careers c
WHERE oat.id_code = c.id_code
    AND MATCH(oat.alternate_title) AGAINST (:clean_string IN NATURAL LANGUAGE MODE)
GROUP BY c_i

UNION

SELECT c.id_code AS c_i, c.title AS c_t, c.seo_title, AVG(MATCH(od.description) AGAINST (:clean_string IN NATURAL LANGUAGE MODE)) * :career_occupation_data_factor AS score, 'q_od' AS qid
FROM other_data od, careers c
WHERE od.id_code = c.id_code
    AND MATCH(od.description) AGAINST (:clean_string IN NATURAL LANGUAGE MODE)
GROUP BY c_i

) AS union_query
GROUP BY c_i

对于大多数搜索,此查询运行良好且快速——甚至是随机生成的无意义字符串。但是,当单词出现在我的搜索字符串中时,它的运行速度非常慢。为什么会这样?

为了证明这一点,我关闭了 MySQL 查询缓存并运行以下测试代码:

public function test() {
    $new_elapsed = 0;

    $search_terms = array(
        "video games",
        "game video",
        "software",
        "nautical architect",
        "CEO",
        "pig and blanket machine",
        "art gallery",
        "rock and roll",
        "guitar god magic metal",
        "mott the hoople"
    );


    for($i=0; $i<10; $i++) {
        $search = $search_terms[$i];
        echo "search term is $search\n";

        $start = microtime(true);

        $results_new = MY_search::search_new($this->db, $search);
        $new_elapsed = (microtime(true) - $start);
        echo "new has " . sizeof($results_new) . " matches\n";
        echo "new_elapsed: $new_elapsed\n";

        echo "--\n";
    }
}

正如您从此处的搜索结果中看到的那样,带有单词and的两个测试查询的运行速度要慢很多。好像慢了一百倍。

search term is video games
new has 76 matches
new_elapsed: 0.23601198196411
--
search term is game video
new has 73 matches
new_elapsed: 0.27281093597412
--
search term is software
new has 124 matches
new_elapsed: 0.41503596305847
--
search term is nautical architect
new has 24 matches
new_elapsed: 0.11621713638306
--
search term is CEO
new has 2 matches
new_elapsed: 0.012537002563477
--
search term is pig and blanket machine
new has 1078 matches
new_elapsed: 19.989203929901
--
search term is art gallery
new has 53 matches
new_elapsed: 0.19855809211731
--
search term is rock and roll
new has 1077 matches
new_elapsed: 16.845540046692
--
search term is guitar god magic metal
new has 140 matches
new_elapsed: 0.66281390190125
--
search term is mott the hoople
new has 0 matches
new_elapsed: 0.0023038387298584
--

值得注意的是,“the”包含在停用词中,并且innodb_ft_min_token_size设置为 3。

为什么会这样?我担心“and”可能会被全文搜索引擎解释为某种运算符。这些较长的搜索词对于我的应用程序是不可接受的。我很想a)在我的停用词中添加“and”或b)从我的代码中的任何搜索字符串中过滤这个词,但我担心这个慢查询问题可能是由其他词触发的。

编辑:根据 Raymond 的评论,我在这里发布了一些解释声明。我专注于内部查询之一,因为它本身非常慢

EXPLAIN SELECT c.id_code AS c_i, c.title AS c_t, c.seo_title, AVG(MATCH(ts.task) AGAINST ('pig and whistle' IN NATURAL LANGUAGE MODE)) * 3 AS score, 'q_ts' AS qid
FROM tasks ts, careers c
WHERE ts.id_code = c.id_code
    AND MATCH(ts.task) AGAINST ('pig and whistle' IN NATURAL LANGUAGE MODE)
GROUP BY c_i

这产生:

id  select_type     table   type    possible_keys   key     key_len     ref     rows    Extra   
1   SIMPLE  ts  fulltext    id_code,task    task    0   NULL    1   Using where; Using temporary; Using filesort
1   SIMPLE  c   ALL     id_code     NULL    NULL    NULL    1110    Using where

Raymond 建议该查询不符合 ANSI,因此我修改了 group by 子句以添加其他列并再次尝试。它仍然很慢:

EXPLAIN SELECT c.id_code AS c_i, c.title AS c_t, c.seo_title, AVG(MATCH(ts.task) AGAINST ('pig and whistle' IN NATURAL LANGUAGE MODE)) * 3 AS score, 'q_ts' AS qid
FROM tasks ts, careers c
WHERE ts.id_code = c.id_code
    AND MATCH(ts.task) AGAINST ('pig and whistle' IN NATURAL LANGUAGE MODE)
GROUP BY c_i, c_t, seo_title

解释输出是相同的。

如果我解释整个查询(使用 UNION 语句等),我会得到更详细的结果:

id  select_type         table   type    possible_keys   key     key_len     ref     rows    Extra   
1   PRIMARY         <derived2>  ALL     NULL    NULL    NULL    NULL    3332    Using temporary; Using filesort
2   DERIVED         c   fulltext    title   title   0   NULL    1   Using where
3   UNION       ts  fulltext    id_code,task    task    0   NULL    1   Using where; Using temporary; Using filesort
3   UNION       c   ALL     id_code     NULL    NULL    NULL    1110    Using where
4   UNION       oat     fulltext    id_code,alternate_title     alternate_title     0   NULL    1   Using where; Using temporary; Using filesort
4   UNION       c   ALL     id_code     NULL    NULL    NULL    1110    Using where
5   UNION       od  fulltext    PRIMARY,description     description     0   NULL    1   Using where; Using temporary; Using filesort
5   UNION       c   ALL     id_code     NULL    NULL    NULL    1110    Using where
NULL    UNION RESULT        <union2,3,4,5>  ALL     NULL    NULL    NULL    NULL    NULL    Using temporary

编辑 2:Raymond 还要求提供表格定义。他们来了:

CREATE TABLE `careers` (
 `id` int(6) unsigned NOT NULL AUTO_INCREMENT,
 `id_code` varchar(12) NOT NULL DEFAULT '',
 `title` varchar(250) NOT NULL DEFAULT '',
 `title_singular` varchar(250) NOT NULL DEFAULT '',
 `seo_title` varchar(150) NOT NULL,
 `cat_id` varchar(4) NOT NULL DEFAULT '',
 `occ_code` varchar(10) DEFAULT NULL,
 `forum_id` int(6) unsigned NOT NULL DEFAULT '0',
 PRIMARY KEY (`id`),
 UNIQUE KEY `id_code` (`id_code`),
 KEY `cat_id` (`cat_id`),
 KEY `forum_id` (`forum_id`),
 FULLTEXT KEY `title` (`title`)
) ENGINE=MyISAM AUTO_INCREMENT=1111 DEFAULT CHARSET=latin1


CREATE TABLE `tasks` (
 `id_code` char(10) NOT NULL,
 `task_id` decimal(8,0) NOT NULL,
 `task` varchar(1000) NOT NULL,
 `task_type` varchar(12) DEFAULT NULL,
 `incumbents_responding` decimal(4,0) DEFAULT NULL,
 `date_updated` date NOT NULL,
 `domain_source` varchar(30) NOT NULL,
 PRIMARY KEY (`task_id`),
 KEY `id_code` (`id_code`),
 FULLTEXT KEY `task` (`task`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8


CREATE TABLE `other_titles` (
 `id_code` char(10) NOT NULL,
 `alternate_title` varchar(150) NOT NULL,
 `short_title` varchar(150) DEFAULT NULL,
 `sources` varchar(50) NOT NULL,
 KEY `id_code` (`id_code`),
 FULLTEXT KEY `alternate_title` (`alternate_title`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8



CREATE TABLE `other_data` (
 `id_code` char(10) NOT NULL,
 `title` varchar(150) NOT NULL,
 `description` varchar(1000) NOT NULL,
 PRIMARY KEY (`id_code`),
 FULLTEXT KEY `description` (`description`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

标签: phpmysqlfull-text-searchcodeigniter-3

解决方案


推荐阅读