首页 > 解决方案 > Elasticsearch 完成字段不返回关于使用 _analyze api 响应中返回的令牌进行搜索的建议

问题描述

我试图用弹性搜索完成字段建议器实现自动完成功能。

Step1:创建一个test_index:

curl --location --request PUT 'http://localhost:9200/test_index?pretty' \
--header 'Content-Type: application/json' \
--data-raw '{"settings": {"number_of_shards": 1, "max_ngram_diff": 7, "number_of_replicas": "0", "analysis": {"filter": {"edge_ngram_completion_filter": {"token_chars": ["whitespace", "digit"], "min_gram": "3", "type": "edge_ngram", "max_gram": "10"}}, "analyzer": {"edge_ngram_completion": {"filter": ["lowercase", "edge_ngram_completion_filter"], "type": "custom", "tokenizer": "standard"}}}}, "mappings": {"properties": {"id": {"type": "integer"}, "name": {"type": "text", "fields": {"raw": {"type": "keyword"}, "suggest": {"type": "completion", "analyzer": "edge_ngram_completion", "search_analyzer": "simple", "preserve_separators": true, "preserve_position_increments": true, "max_input_length": 100}}}}}}
'

Step2:索引以下文档

curl --location --request POST 'http://localhost:9200/test_index/_doc?pretty' \
--header 'Content-Type: application/json' \
--data-raw '{
    "name": "PANTOCID DSR CAP",
    "id": 1
}'

第3步:在点击“PANTOCID DSR CAP”的分析api时,我得到[“pan”,“pant”,“panto”,“pantoc”,“pantoci”,“pantocid”,“dsr”,“cap”]令牌

curl --location --request POST 'http://localhost:9200/test_index/_analyze?pretty' \
--header 'Content-Type: application/json' \
--data-raw '{
  "analyzer" : "edge_ngram_completion",
  "text" : "PANTOCID DSR CAP"  
}
'

第 4 步:但是当我使用“dsr”进行搜索时,我没有收到任何建议:

curl --location --request POST 'http://localhost:9200/test_index/_search?pretty' \
--header 'Content-Type: application/json' \
--data-raw '{
  "suggest": {
    "egde_ngram_suggest" : {
       "text": "dsr", 
       "completion" : {
            "field" : "name.suggest"
       }
    }
  }
}
'

这是为什么?我的意思是,如果搜索到的文本是生成的标记之一,那么它必须导致建议匹配,对吗?我在这里错过了什么吗?

任何帮助表示赞赏。提前致谢。

标签: elasticsearchautocomplete

解决方案


What may be confusing is the _analyze step. While you did declare the correct analyzer, try to verify that field's tokenization by specifically requesting that field:

curl --location --request POST 'http://localhost:9200/test_index/_analyze?pretty' \
--header 'Content-Type: application/json' \
--data-raw '{
  "field" : "name.suggest",              <---
  "text" : "PANTOCID DSR CAP"  
}
'

When you run that, you'll see that the text was n-grammed from the very beginning:

pandsrcap
pant dsr cap
...

and none of these token variations would start w/ dsr and ditch the pan prefix.

What this tells us is that the completion field works properly -- it's meant for autocomplete implementations, not for middle-of-the-text searches like you seem to aim for.


推荐阅读