elasticsearch - 基于前缀和自定义分词器的 Elasticsearch 自动建议
问题描述
我目前正在使用 ngram 开发自动建议功能。
我有以下过滤器,分析器:
"nGram_filter": {
"type": "nGram",
"min_gram": 3,
"max_gram": 10,
"token_chars": [
"letter",
"digit",
"punctuation",
"symbol"
]
}
"nGram_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"asciifolding",
"nGram_filter"
]
}
现在,当我对样本数据进行标记test_table_for analyzers
并搜索字符串test、table、analyzers时,我可以获得上述记录。现在我知道令牌是使用我指定的过滤器创建的,因此它正在工作。
但我需要为此添加另一个功能 - 我也需要启用前缀过滤器。例如:当我搜索test_table (10 chars) 时,我能够得到结果,因为 max n-gram 为 10,但是当我尝试test_table_for时,它返回零结果,因为 record 没有这个标记test_table_for analyzers
。
如何为现有的 n-gram 分析器添加基于前缀的过滤器?就像我应该能够在搜索时获得最多匹配 10 个字符的结果(目前有效),而且我应该能够建议搜索字符串何时与从开始的记录匹配。
解决方案
使用单个分析器是不可能的,您必须创建另一个字段,您可以在其中创建将用于搜索的edge_ngram 标记prefix
,添加索引映射,显示其中还包括您当前的分析器。
索引映射
{
"settings": {
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 30
},
"nGram_filter": {
"type": "nGram",
"min_gram": 3,
"max_gram": 10,
"token_chars": [
"letter",
"digit",
"punctuation",
"symbol"
]
}
},
"analyzer": {
"prefixanalyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"autocomplete_filter"
]
},
"ngramanalyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"nGram_filter"
]
}
}
},
"index.max_ngram_diff" : 30
},
"mappings": {
"properties": {
"title_prefix": {
"type": "text",
"analyzer": "prefixanalyzer",
"search_analyzer": "standard"
},
"title" :{
"type": "text",
"analyzer": "ngramanalyzer",
"search_analyzer": "standard"
}
}
}
}
现在您可以使用使用analyze
API 来确认前缀令牌:
{
"analyzer": "prefixanalyzer",
"text" : "test_table_for analyzers"
}
并且你的tokentest_table_for
也存在,如下图
{"tokens":[{"token":"t","start_offset":0,"end_offset":14,"type":"<ALPHANUM>","position":0},{"token":"te","start_offset":0,"end_offset":14,"type":"<ALPHANUM>","position":0},{"token":"tes","start_offset":0,"end_offset":14,"type":"<ALPHANUM>","position":0},{"token":"test","start_offset":0,"end_offset":14,"type":"<ALPHANUM>","position":0},{"token":"test_","start_offset":0,"end_offset":14,"type":"<ALPHANUM>","position":0},{"token":"test_t","start_offset":0,"end_offset":14,"type":"<ALPHANUM>","position":0},{"token":"test_ta","start_offset":0,"end_offset":14,"type":"<ALPHANUM>","position":0},{"token":"test_tab","start_offset":0,"end_offset":14,"type":"<ALPHANUM>","position":0},{"token":"test_tabl","start_offset":0,"end_offset":14,"type":"<ALPHANUM>","position":0},{"token":"test_table","start_offset":0,"end_offset":14,"type":"<ALPHANUM>","position":0},{"token":"test_table_","start_offset":0,"end_offset":14,"type":"<ALPHANUM>","position":0},{"token":"test_table_f","start_offset":0,"end_offset":14,"type":"<ALPHANUM>","position":0},{"token":"test_table_fo","start_offset":0,"end_offset":14,"type":"<ALPHANUM>","position":0},{"token":"test_table_for","start_offset":0,"end_offset":14,"type":"<ALPHANUM>","position":0},{"token":"a","start_offset":15,"end_offset":24,"type":"<ALPHANUM>","position":1},{"token":"an","start_offset":15,"end_offset":24,"type":"<ALPHANUM>","position":1},{"token":"ana","start_offset":15,"end_offset":24,"type":"<ALPHANUM>","position":1},{"token":"anal","start_offset":15,"end_offset":24,"type":"<ALPHANUM>","position":1},{"token":"analy","start_offset":15,"end_offset":24,"type":"<ALPHANUM>","position":1},{"token":"analyz","start_offset":15,"end_offset":24,"type":"<ALPHANUM>","position":1},{"token":"analyze","start_offset":15,"end_offset":24,"type":"<ALPHANUM>","position":1},{"token":"analyzer","start_offset":15,"end_offset":24,"type":"<ALPHANUM>","position":1},{"token":"analyzers","start_offset":15,"end_offset":24,"type":"<ALPHANUM>","position":1}]}
现在,您可以使用多重匹配查询,它将为您提供所需的搜索结果,如下所示:
搜索查询
{
"query": {
"multi_match": {
"query": "test_table_for",
"fields": [
"title",
"title_prefix"
]
}
}
}
搜索结果
"hits": [
{
"_index": "so_63981157",
"_type": "_doc",
"_id": "1",
"_score": 0.45920232,
"_source": {
"title_prefix": "test_table_for analyzers",
"title": "test_table_for analyzers"
}
}
]
推荐阅读
- arduino - 连续两次调用analogWrite 后,Arduino Mega2560 重新启动
- linux - Anaconda Navigator 无法在 Linux Mint 上启动
- sendgrid - 如何将文件附加到 Sendgrid 单次发送电子邮件?
- python - 训练中的损失始终为零
- c# - 如何根据 C# 中的条件组合具有不同属性的两个列表
- python - 如何在 Python sounddevice 中录制特定时间的音频,直到某些操作(如键盘按下)?
- javascript - 在 React useState 中使用(或 ||)语法进行赋值
- apollo - 避免在执行异步代码期间替换/混合 apollo-server 数据源中的用户上下文
- python - 具有动态字段值的 Django 表单
- php - 在单个查询中过滤多个字段的查询