首页 > 解决方案 > 弹性搜索边缘图未正确搜索

问题描述

此处显示: 查询命中

我搜索“嘿”,检索到的记录之一是“你好”。

另一个例子是: 查询命中

再一次,我搜索“infrared”并显示一条内容为:“This is a message at index: 1”。

这是索引的设置:

settings analysis: {
    filter: {
      edge_ngram_filter: {
        type: "edge_ngram",
        min_gram: "2",
        max_gram: "20",
      }
    },
    analyzer: {
      edge_ngram_analyzer: {
        type: "custom",
        tokenizer: "standard",
        filter: ["lowercase", "edge_ngram_filter"]
      }
    }
  } do
    mappings dynamic: true do
      indexes :content, type: :text, analyzer: "edge_ngram_analyzer"
      # indexes :chat_id, type: :long
    end
  end

标签: ruby-on-railselasticsearch

解决方案


根据您生成的索引映射令牌hey将是

GET /_analyze

{
  "tokens": [
    {
      "token": "he",
      "start_offset": 0,
      "end_offset": 2,
      "type": "word",
      "position": 0
    },
    {
      "token": "hey",
      "start_offset": 0,
      "end_offset": 3,
      "type": "word",
      "position": 1
    }
  ]
}

生成的令牌hello将是

GET /_analyze

{
  "tokens": [
    {
      "token": "he",
      "start_offset": 0,
      "end_offset": 2,
      "type": "word",
      "position": 0
    },
    {
      "token": "hel",
      "start_offset": 0,
      "end_offset": 3,
      "type": "word",
      "position": 1
    },
    {
      "token": "hell",
      "start_offset": 0,
      "end_offset": 4,
      "type": "word",
      "position": 2
    },
    {
      "token": "hello",
      "start_offset": 0,
      "end_offset": 5,
      "type": "word",
      "position": 3
    }
  ]
}

由于以上两个都有he令牌,所以如果你搜索hey,两个文档都将匹配


将您的索引映射修改为

{
    "settings": {
        "analysis": {
            "analyzer": {
                "my_analyzer": {
                    "tokenizer": "my_tokenizer"
                }
            },
            "tokenizer": {
                "my_tokenizer": {
                    "type": "edge_ngram",
                    "min_gram": 3,            // note this
                    "max_gram": 10,
                    "token_chars": [
                        "letter",
                        "digit"
                    ]
                }
            }
        },
        "max_ngram_diff": 10
    },
    "mappings": {
        "properties": {
            "content": {
                "type": "text",
                "analyzer": "my_analyzer"
            }
        }
    }
}

现在使用分析 API

GET /_analyze

{
  "analyzer" : "my_analyzer",
  "text" : "hey"
}

令牌将是

{
  "tokens": [
    {
      "token": "hey",
      "start_offset": 0,
      "end_offset": 3,
      "type": "word",
      "position": 0
    }
  ]
}

指数数据:

{
  "content": "hey"
}
{
  "content": "hello"
}

搜索查询:

{
  "query":{
    "match":{
      "content":"hey"
    }
  }
}

搜索结果:

"hits": [
      {
        "_index": "66754045",
        "_type": "_doc",
        "_id": "2",
        "_score": 0.8713851,
        "_source": {
          "content": "hey"
        }
      }
    ]

推荐阅读