首页 > 解决方案 > 在 ElasticSearch 7.6 中使用空格搜索关键字

问题描述

我正在尝试在 ElasticSearch 7.6 中实现基于城市名称的搜索,但我对包含空格的单词有疑问,如下例所示:

Query: "toronto, new mexico, paris, lisbona, new york, sedro-woolley".

这是我的映射模式:

mapping = {
    "mappings": {
        "properties": {
            "date": { 
                "type": "date" 
            },
            "description": { 
                "type": "text", 
                "fielddata": True 
            },
        }
    }
}

这是我的查询:

{
    "query" : {
        "match": { "description": escaped_keywords }
    },
    "highlight" : {
        "pre_tags" : ["<match>"],
        "post_tags" : ["</match>"],
        "fields" : {
            "description" : {"number_of_fragments" : 0 }
        }
    }
}

escaped_keywords包含转义的previuos关键字,如下:"toronto new\\ mexico paris lisbona new\\ york sedro\\-woolley"

因此,该查询适用于单一名称城市和带有破折号的城市,但不适用于带有空格的名称(纽约,新墨西哥),它们被拆分为(纽约,纽约,新,墨西哥)。

我也尝试以这种方式为有空间的城市放置括号,toronto (new mexico) paris lisbona (new york) sedro\\-woolley但结果没有改变。

EDIT Highlight 也不适用于包含破折号的名称。它返回拆分后的单词(例如 [sedro,wooley] 而不是 [sedro-wooley])

编辑 2我的意图是使用highlight tags匹配动态关键字列表(例如“new york”、“toronto”、“sedro-wooley”) 。这是一个数据样本:

{
    "_index": "test_stackoverflow",
    "_type": "_doc",
    "_id": "x4nKv3EBQE6DGGITWX-O",
    "_version": 1,
    "_seq_no": 0,
    "_primary_term": 1,
    "found": true,
    "_source": {
        "title": "Best places: New Mexico and Sedro-Woolley",
        "description": "This is an example text containing some cities like New York and Toronto. So, there are also Milton-Freewater and Las Vegas!"
    }
}

标签: elasticsearchelasticsearch-query

解决方案


您需要使用 char 过滤器定义自定义分析器以删除空格和连字符 ( -) ,以便生成的令牌符合您的要求。

索引定义

{
    "settings": {
        "analysis": {
            "char_filter": {
                "my_space_char_filter": {
                    "type": "mapping",
                    "mappings": [
                        "\\u0020=>",  -> whitespace
                        "\\u002D=>"   --> for hyphen(-)
                    ]
                }
            },
            "analyzer": {
                "splcharanalyzer": {
                    "char_filter": [
                        "my_space_char_filter"
                    ],
                    "tokenizer": "standard",
                    "filter": [
                        "lowercase"
                    ]
                }
            }
        }
    },
    "mappings" :{
        "properties" :{
            "title" :{
                "type" : "text",
                "analyzer" : "splcharanalyzer"
            }
        }
    }
}

自定义生成的令牌splcharanalyzer

POST myindex/_analyze

{
  "analyzer": "splcharanalyzer",
  "text": "toronto, new mexico, paris, lisbona, new york, sedro-woolley"
}

{
    "tokens": [
        {
            "token": "toronto",
            "start_offset": 0,
            "end_offset": 7,
            "type": "<ALPHANUM>",
            "position": 0
        },
        {
            "token": "newmexico",
            "start_offset": 9,
            "end_offset": 19,
            "type": "<ALPHANUM>",
            "position": 1
        },
        {
            "token": "paris",
            "start_offset": 21,
            "end_offset": 26,
            "type": "<ALPHANUM>",
            "position": 2
        },
        {
            "token": "lisbona",
            "start_offset": 28,
            "end_offset": 35,
            "type": "<ALPHANUM>",
            "position": 3
        },
        {
            "token": "newyork",
            "start_offset": 37,
            "end_offset": 45,
            "type": "<ALPHANUM>",
            "position": 4
        },
        {
            "token": "sedrowoolley",
            "start_offset": 47,
            "end_offset": 60,
            "type": "<ALPHANUM>",
            "position": 5
        }
    ]
}

差异搜索查询

{
    "query": {
        "match" : {
            "title" : {
                "query" : "sedro-woolley"
            }
        }
    }
}

搜索结果

 "hits": [
            {
                "_index": "white",
                "_type": "_doc",
                "_id": "1",
                "_score": 0.2876821,
                "_source": {
                    "title": "toronto, new mexico, paris, lisbona, new york, sedro-woolley"
                }
            }
        ]

搜索newyork不会产生任何结果。

{
    "query": {
        "match" : {
            "title" : {
                "query" : "york"
            }
        }
    }
}

结果

 "hits": {
        "total": {
            "value": 0,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    }

推荐阅读