首页 > 解决方案 > Elastic Search 多匹配查询不能忽略特殊字符

问题描述

我有一个名称字段值为“abc_name”,所以当我搜索“abc_”时,我得到了正确的结果,但是当我搜索“abc_@#£&-#&”时,我仍然得到相同的结果。我希望我的查询忽略这个与我的查询不匹配的特殊字符。

我的查询有:

我想要这个结构,否则它会影响我的其他搜索行为

  "name": {
         "type": "text",
           "fields": {
           "keyword": {
                 "type": "keyword",
                    "ignore_above": 256
                     }
                       },
                     "analyzer": "autocomplete",
                 "search_analyzer": "standard"
                            }

标签: node.jselasticsearchgraphql

解决方案


请参阅以下适合您的用例的示例,其中我创建了适合您的用例的自定义分析器

样本映射:

PUT some_test_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_custom_analyzer": { 
          "type": "custom",
          "tokenizer": "custom_tokenizer",
          "filter": ["lowercase", "3_5_edge_ngram"]
        }
      },
      "tokenizer": {
        "custom_tokenizer": { 
          "type": "pattern",
          "pattern": "\\w+_+[^a-zA-Z\\d\\s_]+|\\s+".      <---- Note this pattern
        }
      },
      "filter": {
        "3_5_edge_ngram": {
          "type": "edge_ngram",
          "min_gram": 3,
          "max_gram": 5
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "my_field":{
        "type": "text",
        "analyzer": "my_custom_analyzer"
      }
    }
  }
}

上面提到的模式会简单地忽略格式如abc_$%^^##. 因此,该令牌不会被索引。

请注意,分析仪的工作方式是:

  • 首先执行分词器
  • 然后对生成的标记应用 edge_ngram 过滤器。

您可以通过简单地删除上述映射中的 edge_ngram 过滤器来验证,首先了解通过分析 API生成了哪些令牌,如下所示:

POST some_test_index/_analyze
{
  "analyzer": "my_custom_analyzer",
  "text": "abc_name asda efg_!@#!@# 1213_adav"
}

生成的令牌:

{
  "tokens" : [
    {
      "token" : "abc_name",
      "start_offset" : 0,
      "end_offset" : 8,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "asda",
      "start_offset" : 9,
      "end_offset" : 13,
      "type" : "word",
      "position" : 1
    },
    {
      "token" : "1213_adav",
      "start_offset" : 25,
      "end_offset" : 34,
      "type" : "word",
      "position" : 2
    }
  ]
}

请注意,令牌efg_!@#!@#已被删除。

我添加了 edge_ngram fitler,因为abc_如果您通过 tokenizer 生成的令牌是abc_name.

样本文件:

POST some_test_index/_doc/1
{
  "my_field": "abc_name asda efg_!@#!@# 1213_adav"
}

查询请求:

用例 1:

POST some_test_index/_search
{
  "query": {
    "match": {
      "my_field": "abc_"
    }
  }
}

用例 2:

POST some_test_index/_search
{
  "query": {
    "match": {
      "my_field": "efg_!@#!@#"
    }
  }
}

回应:

用例 1 的响应:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.47992462,
    "hits" : [
      {
        "_index" : "some_test_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.47992462,
        "_source" : {
          "my_field" : "abc_name asda efg_!@#!@# 1213_adav"
        }
      }
    ]
  }
}

用例 2 的响应:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}

更新答案:

根据我创建的索引创建您的映射,并让我知道它是否有效:

PUT some_test_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_custom_analyzer": { 
          "type": "custom",
          "tokenizer": "punctuation",
          "filter": ["lowercase"]
        }
      },
      "tokenizer": {
        "punctuation": { 
          "type": "pattern",
          "pattern": "\\w+_+[^a-zA-Z\\d\\s_]+|\\s+"
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "my_field":{
        "type": "text",
        "analyzer": "autocompete",                   <----- Assuming you have already this in setting
        "search_analyzer": "my_custom_analyzer".     <----- Note this
      }
    }
  }
}

请尝试让我知道这是否适用于您的所有用例。


推荐阅读