首页 > 解决方案 > 当字段包含感叹号时,Elasticsearch 突出显示的文本中缺少文本

问题描述

搜索文本并请求结果查询高亮时,如果匹配的文档字段包含感叹号,则返回的高亮文本不包含包含感叹号的部分文本

弹性搜索 7.1.1 版

文档:{ "name" : "Yahoo! Inc [Please refer to Altaba Inc and Verizon Communications Inc]"} 使用突出显示搜索“inc”通配符

预期: 突出显示的文本应该是:

"Yahoo! <em>Inc</em> [Please refer to Altaba <em>Inc</em> and Verizon Communications <em>Inc</em>]"

实际: “雅虎!” 响应中缺少。拿到:

"<em>Inc</em> [Please refer to Altaba <em>Inc</em> and Verizon Communications <em>Inc</em>]"

我认为这与 ! 标记。如果我删除它,那么一切都很好。

重现步骤:

将文档添加到新索引

POST test/_doc/ { "name" : "Yahoo! Inc [Please refer to Altaba Inc and Verizon Communications Inc]" }

没有其他设置/映射

运行查询

GET test/_search { "query": { "bool": { "should": [ { "wildcard": { "name": { "wildcard": "inc*" } } } ] } }, "highlight": { "fields": { "name" : {} } } }

得到以下结果:

"hits" : [ { "_index" : "test", "_type" : "_doc", "_id" : "511tP3ABoqekxkoUshVf", "_score" : 1.0, "_source" : { "name" : "Yahoo! Inc [Please refer to Altaba Inc and Verizon Communications Inc]" }, "highlight" : { "name" : [ "<em>Inc</em> [Please refer to Altaba <em>Inc</em> and Verizon Communications <em>Inc</em>]" ] } } ]

期待亮点:

"Yahoo! <em>Inc</em> [Please refer to Altaba <em>Inc</em> and Verizon Communications <em>Inc</em>]"

标签: elasticsearch

解决方案


这是预期的行为,因为默认情况下,Elasticsearch 突出显示返回搜索文本(片段)的一部分,请参见:https://www.elastic.co/guide/en/elasticsearch/reference/7.1/search-request-highlighting。 html#unified-highlighter

!和 。被认为是前一句的结尾,并且突出显示不会返回该片段。

在我的例子中,搜索到的文本代表一个文本长度较小的名称,并且通过添加"number_of_fragments" : 0我强制突出显示返回整个文档字段。

"highlight": {
  "fields": {
     "name" : {"number_of_fragments" : 0}
  }
}

同:https ://github.com/elastic/elasticsearch/issues/52333


推荐阅读