elasticsearch - 在 ElasticSearch 7.6 中使用空格搜索关键字
问题描述
我正在尝试在 ElasticSearch 7.6 中实现基于城市名称的搜索,但我对包含空格的单词有疑问,如下例所示:
Query: "toronto, new mexico, paris, lisbona, new york, sedro-woolley".
这是我的映射模式:
mapping = {
"mappings": {
"properties": {
"date": {
"type": "date"
},
"description": {
"type": "text",
"fielddata": True
},
}
}
}
这是我的查询:
{
"query" : {
"match": { "description": escaped_keywords }
},
"highlight" : {
"pre_tags" : ["<match>"],
"post_tags" : ["</match>"],
"fields" : {
"description" : {"number_of_fragments" : 0 }
}
}
}
escaped_keywords
包含转义的previuos关键字,如下:"toronto new\\ mexico paris lisbona new\\ york sedro\\-woolley"
因此,该查询适用于单一名称城市和带有破折号的城市,但不适用于带有空格的名称(纽约,新墨西哥),它们被拆分为(纽约,纽约,新,墨西哥)。
我也尝试以这种方式为有空间的城市放置括号,toronto (new mexico) paris lisbona (new york) sedro\\-woolley
但结果没有改变。
EDIT Highlight 也不适用于包含破折号的名称。它返回拆分后的单词(例如 [sedro,wooley] 而不是 [sedro-wooley])
编辑 2我的意图是使用highlight tags匹配动态关键字列表(例如“new york”、“toronto”、“sedro-wooley”) 。这是一个数据样本:
{
"_index": "test_stackoverflow",
"_type": "_doc",
"_id": "x4nKv3EBQE6DGGITWX-O",
"_version": 1,
"_seq_no": 0,
"_primary_term": 1,
"found": true,
"_source": {
"title": "Best places: New Mexico and Sedro-Woolley",
"description": "This is an example text containing some cities like New York and Toronto. So, there are also Milton-Freewater and Las Vegas!"
}
}
解决方案
您需要使用 char 过滤器定义自定义分析器以删除空格和连字符 ( -
) ,以便生成的令牌符合您的要求。
索引定义
{
"settings": {
"analysis": {
"char_filter": {
"my_space_char_filter": {
"type": "mapping",
"mappings": [
"\\u0020=>", -> whitespace
"\\u002D=>" --> for hyphen(-)
]
}
},
"analyzer": {
"splcharanalyzer": {
"char_filter": [
"my_space_char_filter"
],
"tokenizer": "standard",
"filter": [
"lowercase"
]
}
}
}
},
"mappings" :{
"properties" :{
"title" :{
"type" : "text",
"analyzer" : "splcharanalyzer"
}
}
}
}
自定义生成的令牌splcharanalyzer
POST myindex/_analyze
{
"analyzer": "splcharanalyzer",
"text": "toronto, new mexico, paris, lisbona, new york, sedro-woolley"
}
{
"tokens": [
{
"token": "toronto",
"start_offset": 0,
"end_offset": 7,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "newmexico",
"start_offset": 9,
"end_offset": 19,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "paris",
"start_offset": 21,
"end_offset": 26,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "lisbona",
"start_offset": 28,
"end_offset": 35,
"type": "<ALPHANUM>",
"position": 3
},
{
"token": "newyork",
"start_offset": 37,
"end_offset": 45,
"type": "<ALPHANUM>",
"position": 4
},
{
"token": "sedrowoolley",
"start_offset": 47,
"end_offset": 60,
"type": "<ALPHANUM>",
"position": 5
}
]
}
差异搜索查询
{
"query": {
"match" : {
"title" : {
"query" : "sedro-woolley"
}
}
}
}
搜索结果
"hits": [
{
"_index": "white",
"_type": "_doc",
"_id": "1",
"_score": 0.2876821,
"_source": {
"title": "toronto, new mexico, paris, lisbona, new york, sedro-woolley"
}
}
]
搜索new
或york
不会产生任何结果。
{
"query": {
"match" : {
"title" : {
"query" : "york"
}
}
}
}
结果
"hits": {
"total": {
"value": 0,
"relation": "eq"
},
"max_score": null,
"hits": []
}
推荐阅读
- sql-server - 在 Scala 中将 ResultSet 更改为 TYPE_SCROLL_INSENSITIVE
- android - targetSdkVersion 30 的 ActivityCompat.requestPermissions 不起作用
- json - JSONArray 没有消耗来自 HTTPServletResponse 的完整响应
- pango - 使用 pango_layout 时带空格的文本拆分为新行
- linux - systemd 服务:执行命令失败:权限被拒绝
- docker - telnet:无法连接到远程主机:尝试连接正在运行的 docker 映像时连接被拒绝
- android - VALIDATE_APP_MESSAGE_ICON_NOT_UNIFORM 问题
- r - 从向量中查找文本中的匹配词
- python - AWS python Lambda 无法访问 EFS 文件
- discord - 如何知道用户是否已经在 addrole 命令中具有角色