首页 > 解决方案 > ElasticSearch 无法识别数字

问题描述

我将此配置用于搜索和映射:

投入:9200/订户

{
  "settings": {
    "analysis": {
      "analyzer": {
        "autocomplete": {
          "tokenizer": "autocomplete",
          "filter": [
            "lowercase"
          ]
        },
        "autocomplete_search": {
          "tokenizer": "lowercase"
        }
      },
      "tokenizer": {
        "autocomplete": {
          "type": "edge_ngram",
          "min_gram": 2,
          "max_gram": 10,
          "token_chars": [
            "letter",
            "digit"
          ]
        }
      }
    }
  },
  "mappings": {
    "doc": {
      "properties": {
         "id": {
          "type": "text",
          "analyzer": "autocomplete",
          "search_analyzer": "autocomplete_search"
        },
        "name": {
          "type": "text",
          "analyzer": "autocomplete",
          "search_analyzer": "autocomplete_search"
        },
         "contact_number": {
          "type": "text",
          "analyzer": "autocomplete",
          "search_analyzer": "autocomplete_search"
        }
      }
    }
  }
}

但是当我添加新对象时:

POST :9200/订阅者/doc/?pretty

{
  "id": "1421997",
  "name": "John 333 Martin",
  "contact_number":"+43fdsds*543254365"
}

如果我搜索多个这样的字段

发布:9200/订阅者/doc/_search

{
    "query": {
        "multi_match": {
            "query": "Joh",
            "fields": [
                "name",
                "id",
                "contact_number"
            ],
            "type": "best_fields"
        }
    }
}

它成功返回"John 333 Martin"。但是当我这样做时: "query": "333"or "query": "+43fds"or "query": "14219",它什么也不返回。这很奇怪,因为我也为数字配置了过滤器:

 "token_chars": [
            "letter",
            "digit"
          ]

我应该怎么做才能按所有字段搜索并查看带有数字的结果?


更新:

即使GET :9200/subscribers/_analyze

{
  "analyzer": "autocomplete",
  "text": "+43fdsds*543254365"
}

显示绝对正确的组合,如"43", "43f", "43fd", "43fds". 但搜索没有。可能是我的搜索查询不正确?

标签: elasticsearch

解决方案


您的搜索使用的分析器与用于在倒排索引中创建标记的分析器不同。因为您将lowercase标记器用作 search_analyzer,所以数字被剥离。见下文

POST _analyze
{
  "tokenizer": "lowercase",
  "text":     "+43fdsds*543254365"
}

生产

{
  "tokens" : [
    {
      "token" : "fdsds",
      "start_offset" : 3,
      "end_offset" : 8,
      "type" : "word",
      "position" : 0
    }
  ]
}

而是使用standard分析器作为您的 search_analyzer,即如下所示修改您的映射,它将按预期工作

"mappings": {
    "doc": {
      "properties": {
         "id": {
          "type": "text",
          "analyzer": "autocomplete",
          "search_analyzer": "standard"
        },
        "name": {
          "type": "text",
          "analyzer": "autocomplete",
          "search_analyzer": "standard"
        },
         "contact_number": {
          "type": "text",
          "analyzer": "autocomplete",
          "search_analyzer": "standard"
        }
      }
    }
  }

使用standard分析仪

POST _analyze
{
  "analyzer": "standard",
  "text":     "+43fdsds*543254365"
}

生产

{
  "tokens" : [
    {
      "token" : "43fdsds",
      "start_offset" : 1,
      "end_offset" : 8,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "543254365",
      "start_offset" : 9,
      "end_offset" : 18,
      "type" : "<NUM>",
      "position" : 1
    }
  ]
}

推荐阅读