首页 > 解决方案 > 有什么方法可以限制弹性搜索只匹配最接近的令牌?[边缘 n-gram,模糊性]

问题描述

使用 Tokenizer、Fuzziness 和 Edge n-gram 我有三个文档:

模糊搜索“星际迷航”给“星际迷航”比“星际迷航”更高的分数,因为额外的标记匹配“迷航”(=>“迷航”)。对抗这种情况的最佳方法是减少或没有模糊性吗?

此外,“ Star Trakian: A Star Trek Documentary ”得分更高,因为它与“ Trak ”和“ Trek ”相匹配。有没有办法只匹配最好的代币或任何其他方法来给它打分,就像“星际迷航 I ”一样(因为两者都包含“星际迷航”)?

编辑:

映射和设置:

PUT /stackoverflow
{
  "settings": {
    "number_of_shards": 1,
    "analysis": {
      "filter": {
        "edge_n_gram": {
          "type": "edge_ngram",
          "min_gram": "1",
          "max_gram": "50"
        }
      },
      "analyzer": {
        "autocomplete": {
          "filter": [
            "lowercase",
            "asciifolding",
            "edge_n_gram"
          ],
          "type": "custom",
          "tokenizer": "autocomplete"
        },
        "autocomplete_search": {
          "filter": [
            "lowercase",
            "asciifolding"
          ],
          "type": "custom",
          "tokenizer": "char_group"
        },
        "full_word": {
          "filter": [
            "lowercase",
            "asciifolding"
          ],
          "type": "custom",
          "tokenizer": "char_group"
        }
      },
      "tokenizer": {
        "autocomplete": {
          "type": "standard"
        },
        "char_group": {
          "type": "char_group",
          "tokenize_on_chars": [
            "whitespace",
            "-",
            "."
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "search_field_full": {
        "type": "text",
        "similarity": "boolean",
        "fields": {
          "raw": {
            "type": "text",
            "similarity": "boolean",
            "analyzer": "full_word",
            "search_analyzer": "autocomplete_search"
          }
        },
        "analyzer": "autocomplete",
        "search_analyzer": "autocomplete_search"
      }
    }
  }
}

文件:

POST stackoverflow/_doc/
{
  "search_field_full": "Star Trek I"
}

POST stackoverflow/_doc/
{
  "search_field_full": "Star Trakian: A Star Trek Documentary"
}

POST stackoverflow/_doc/
{
  "search_field_full": "Star Trekian"
}

询问:

GET stackoverflow/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "multi_match": {
            "fields": [
              "search_field_full"
            ],
            "fuzziness": "AUTO:4,7",
            "max_expansions": 500,
            "minimum_should_match": 2,
            "operator": "or",
            "query": "Star Trek",
            "type": "best_fields"
          }
        }
      ],
      "should": [
        {
          "multi_match": {
            "fields": [
              "search_field_full.raw^30"
            ],
            "fuzziness": 0,
            "operator": "or",
            "query": "Star Trek",
            "type": "best_fields"
          }
        },
        {
          "multi_match": {
            "fields": [
              "search_field_full.raw^20"
            ],
            "fuzziness": 1,
            "operator": "or",
            "query": "Star Trek",
            "type": "best_fields"
          }
        }
      ]
    }
  }
}

标签: elasticsearch

解决方案


推荐阅读