elasticsearch - 有什么方法可以限制弹性搜索只匹配最接近的令牌?[边缘 n-gram,模糊性]
问题描述
使用 Tokenizer、Fuzziness 和 Edge n-gram 我有三个文档:
- “星际迷航我”
- “星际迷航”
- “星际特拉基安:星际迷航纪录片”
模糊搜索“星际迷航”给“星际迷航”比“星际迷航”更高的分数,因为额外的标记匹配“迷航”(=>“迷航”)。对抗这种情况的最佳方法是减少或没有模糊性吗?
此外,“ Star Trakian: A Star Trek Documentary ”得分更高,因为它与“ Trak ”和“ Trek ”相匹配。有没有办法只匹配最好的代币或任何其他方法来给它打分,就像“星际迷航 I ”一样(因为两者都包含“星际迷航”)?
编辑:
映射和设置:
PUT /stackoverflow
{
"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"edge_n_gram": {
"type": "edge_ngram",
"min_gram": "1",
"max_gram": "50"
}
},
"analyzer": {
"autocomplete": {
"filter": [
"lowercase",
"asciifolding",
"edge_n_gram"
],
"type": "custom",
"tokenizer": "autocomplete"
},
"autocomplete_search": {
"filter": [
"lowercase",
"asciifolding"
],
"type": "custom",
"tokenizer": "char_group"
},
"full_word": {
"filter": [
"lowercase",
"asciifolding"
],
"type": "custom",
"tokenizer": "char_group"
}
},
"tokenizer": {
"autocomplete": {
"type": "standard"
},
"char_group": {
"type": "char_group",
"tokenize_on_chars": [
"whitespace",
"-",
"."
]
}
}
}
},
"mappings": {
"properties": {
"search_field_full": {
"type": "text",
"similarity": "boolean",
"fields": {
"raw": {
"type": "text",
"similarity": "boolean",
"analyzer": "full_word",
"search_analyzer": "autocomplete_search"
}
},
"analyzer": "autocomplete",
"search_analyzer": "autocomplete_search"
}
}
}
}
文件:
POST stackoverflow/_doc/
{
"search_field_full": "Star Trek I"
}
POST stackoverflow/_doc/
{
"search_field_full": "Star Trakian: A Star Trek Documentary"
}
POST stackoverflow/_doc/
{
"search_field_full": "Star Trekian"
}
询问:
GET stackoverflow/_search
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"fields": [
"search_field_full"
],
"fuzziness": "AUTO:4,7",
"max_expansions": 500,
"minimum_should_match": 2,
"operator": "or",
"query": "Star Trek",
"type": "best_fields"
}
}
],
"should": [
{
"multi_match": {
"fields": [
"search_field_full.raw^30"
],
"fuzziness": 0,
"operator": "or",
"query": "Star Trek",
"type": "best_fields"
}
},
{
"multi_match": {
"fields": [
"search_field_full.raw^20"
],
"fuzziness": 1,
"operator": "or",
"query": "Star Trek",
"type": "best_fields"
}
}
]
}
}
}
解决方案
推荐阅读
- ios - iOS 13 中的条形按钮色调颜色
- localization - 如何使用 Erlang 将数字格式化为带有千位和小数分隔符的字符串(货币)?
- c - 为什么这个函数返回null?
- python - 在文件的每一行上创建一个新文件名
- javascript - 从firebase数据库中检索数据的问题
- reactjs - 仅在父状态更新后如何接收道具?
- azure - 如何使用托管标识从 Azure Kubernetes 服务 (AKS) 访问 Azure Key Vault (AKV)
- javascript - 在 setInterval() 中使用 window.open
- http - 如何使用 Outlook 自适应卡中的“Input.Date”值制作 http 帖子?
- c++ - C++ 随机数生成器只生成 0 - C++ 11