首页 > 解决方案 > ElasticSearch - Unable To Search Using Fuzzy Match Query For Underscore in value (ES Fuzzy not matching underscore value)

问题描述

Suppose I have three documents in my elasticsearch. For Ex:

1: {
    "name": "test_2602"
   }
2: {
    "name": "test-2602"
   }
3: {
    "name": "test 2602"
   }

Now when I search it using fuzzy match query as given below

{
  "query": {
    "bool": {
      "must": [
        {
          "bool": {
            "must": [
              {
                "match": {
                  "name": {
                    "query": "test-2602",
                    "fuzziness": "2",
                    "prefix_length": 0,
                    "max_expansions": 50,
                    "fuzzy_transpositions": true,
                    "lenient": false,
                    "zero_terms_query": "NONE",
                    "boost": 1
                  }
                }
              }
            ],
            "disable_coord": false,
            "adjust_pure_negative": true,
            "boost": 1
          }
        }
      ],
      "disable_coord": false,
      "adjust_pure_negative": true,
      "boost": 1
    }
  }
}

In response I am only getting two documents which is (even if I search by name value as => "test", "test 2602" or "test-2602")

  {
    "name": "test-2602"
  },
  {
    "name": "test 2602"
  }

I am not getting document with name as "test_2602" (not matching with value which contains underscore). I want it to include third document as well with name value as "test_2602". But If I search for name as "test_2602" then in response I get

 {
   "name": "test_2602"
 }

I need to fetch all three documents whenever I search name as "test", "test 2602", "test-2602" and "test_2602"

标签: elasticsearchfuzzy

解决方案


You are getting only two documents in your search because by default elasticsearch uses a standard analyzer, which will tokenize "test-2602" and "test 2602" into test and 2602. But "test_2602" will not be tokenized.

You can check the tokens generated by using analyze API

GET /_analyze

{
  "analyzer" : "standard",
  "text" : "test_2602"
}

The token generated will be

{
  "tokens": [
    {
      "token": "test_2602",
      "start_offset": 0,
      "end_offset": 9,
      "type": "<ALPHANUM>",
      "position": 0
    }
  ]
}

You need to add .keyword to the type field. This uses the keyword analyzer instead of the standard analyzer (notice the ".keyword" after name field). Try out this below query -

Index Mapping:

{
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword"
          }
        }
      }
    }
  }
}

Search Query:

{
  "query": {
    "match": {
      "name.keyword": {
        "query": "test_2602",
        "fuzziness":2
      }
    }
  }
}

Search Result:

"hits": [
      {
        "_index": "66572330",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.9808291,
        "_source": {
          "name": "test_2602"
        }
      },
      {
        "_index": "66572330",
        "_type": "_doc",
        "_id": "3",
        "_score": 0.8718481,
        "_source": {
          "name": "test 2602"
        }
      },
      {
        "_index": "66572330",
        "_type": "_doc",
        "_id": "2",
        "_score": 0.8718481,
        "_source": {
          "name": "test-2602"
        }
      }
    ]

推荐阅读