首页 > 解决方案 > Elasticsearch search bool + 必须查询

问题描述

有人可以告诉我为什么这个 Elastic 查询会返回下面的结果。查询有 bool + must 部分,只有当字段 nn 与字符串“softo”完全匹配时才应该匹配。查询看起来像:

"query":{
        "bool":{
            "must":[
                {"match":{"nn":"softo"}}
            ],
            "should":[
                {"match":{"nn":"sro"}},
                {"match":{"nn":"as"}},
                {"match":{"nn":"no"}},
                {"match":{"nn":"vos"}},
                {"match":{"nn":"ks"}}
            ]
        }
    }

它返回给我一个在 nn 字段中没有软的结果,例如:

            {
                "_index": "search_2",
                "_type": "doc",
                "_id": "17053188",
                "_score": 129.76167,
                "_source": {
                    "nn": "zo soz kovo zts nova as zts elektronika as",
                    "nazov": "ZO SOZ KOVO,ZŤS NOVA a.s.,ZTS ELEKTRONIKA a.s.",
                }
            },
            {
                "_index": "search_2",
                "_type": "doc",
                "_id": "45732078",
                "_score": 126.953285,
                "_source": {
                    "nn": "agentura socialnych sluzieb   ass no",
                    "nazov": "Agentúra sociálnych služieb - ASS n.o.",
                }
            }

我不明白。为什么它返回像“zo soz kovo zts nova as zts elektronika as”这样的结果,其中没有“softo”字符串。nn 字段的映射如下所示:

{
    "search_2": {
        "aliases": {
            "search": {}
        },
        "mappings": {
            "doc": {
                "dynamic": "strict",
                "properties": { 
                    "nn": {
                        "type": "text",
                        "boost": 10,
                        "analyzer": "autocomplete"
                    }
                }
            }
        },
        "settings": {
            "index": {
                "refresh_interval": "-1",
                "number_of_shards": "4",
                "provided_name": "search_2",
                "creation_date": "1539693645683",
                "analysis": {
                    "filter": {
                        "synonym_filter": {
                            "ignore_case": "true",
                            "type": "synonym",
                            "synonyms_path": "synonyms/sk_SK.txt"
                        },
                        "lemmagen_filter_sk": {
                            "type": "lemmagen",
                            "lexicon": "sk"
                        },
                        "stopwords_SK": {
                            "ignore_case": "true",
                            "type": "stop",
                            "stopwords_path": "stopwords/slovak.txt"
                        },
                        "remove_duplicities": {
                            "type": "unique",
                            "only_on_same_position": "true"
                        },
                        "autocomplete_filter": {
                            "type": "edge_ngram",
                            "min_gram": "2",
                            "max_gram": "20"
                        }
                    },
                    "analyzer": {
                        "autocomplete": {
                            "filter": [
                                "stopwords_SK",
                                "lowercase",
                                "stopwords_SK",
                                "autocomplete_filter"
                            ],
                            "type": "custom",
                            "tokenizer": "standard"
                        },
                        "lower_ascii": {
                            "filter": [
                                "lowercase",
                                "asciifolding"
                            ],
                            "type": "custom",
                            "tokenizer": "standard"
                        },
                        "suggestion": {
                            "filter": [
                                "stopwords_SK",
                                "lowercase",
                                "stopwords_SK",
                                "asciifolding"
                            ],
                            "type": "custom",
                            "tokenizer": "standard"
                        }
                    }
                },
                "number_of_replicas": "1",
                "uuid": "eyxXza0pQxWeQCpXih8ngg",
                "version": {
                    "created": "6020399"
                }
            }
        }
    }
}

标签: elasticsearchsearchmatch

解决方案


由于在现场autocomplete应用了分析仪,您获得这些结果的原因。nn我将根据以下领域进行解释:

"nn": "zo soz kovo zts nova as zts elektronika as"

为上述生成的令牌将是:

zo, so, soz, ko, kov, kovo, zt, zts, no, nov, nova, as, zt, zts, el, ele, elek, elekt, elektr, elektro, elektro, elektroni, elektronik, elektronika, as

现在,默认情况下匹配查询将相同的分析器应用于搜索,并且标记之间的默认运算符是OR。所以{"match":{"nn":"softo"}}实际上表现为

{
  "match": {
    "nn": "so OR sof OR soft OR softo"
  }
}

正如您在字段中看到的那样,nn生成的令牌之一是soans 因此它得到匹配。


推荐阅读