首页 > 解决方案 > elasticsearch按特殊字符搜索

问题描述

我有一组以下短语:[remix]、[18+] 等。如何通过一个字符(例如“[”)进行搜索以找到所有这些变体?现在我有以下分析器配置:

{
  "analysis": {
    "analyzer": {
      { "bigram_analyzer": {
        { "type": "custom",
        { "tokenizer": { "keyword",
        { "filter": [
          { "lowercase",
          "bigram_filter".
        ]
      },
      { "full_text_analyzer": {
        { "type": "custom",
        { "tokenizer": { "ngram_tokenizer",
        { "filter": [
          "lowercase"
        ]
      }
    },
    { "filter": {
      { "bigram_filter": {
        { "type": "edge_ngram",
        { "max_gram": 2
      }
    },
    { "tokenizer": {
      { "ngram_tokenizer": {
        { "type": "ngram",
        { "min_gram": 3,
        { "max_gram": 3,
        { "token_chars": [
          { "letter",
          { "digit",
          { "symbol",
          "punctuation"
        ]
      }
    }
  }
}

使用spring boot data elasticsearch starter在java实体级别进行映射

标签: javaspringelasticsearchspring-data-elasticsearch

解决方案


如果我正确理解您的问题 - 您想要实现一个自动完成分析器,它将返回任何以[或任何其他字符开头的术语。为此,您可以使用 ngram 自动完成创建自定义分析器。这是一个例子:

以下是测试指标:

PUT /testing-index-v3
{
  "settings": {
    "number_of_shards": 1,
    "analysis": {
        "filter": {
            "autocomplete_filter": {
                "type": "edge_ngram",
                "min_gram": 1,
                "max_gram": 15
            }
        },
        "analyzer": {
            "autocomplete": {
                "type": "custom",
                "tokenizer": "keyword",
                "filter": [
                    "lowercase",
                    "autocomplete_filter"
                ]
            }
        }
    }
  },
  "mappings": {
    "properties": {
      "term": { 
        "type": "text",
        "analyzer": "autocomplete"
        
      }
    }
  }
}

这是文件输入:

POST /testing-index-v3/_doc
{
  "term": "[+18]"
}

POST testing-index-v3/_doc
{
  "term": "[remix]"
}

POST testing-index-v3/_doc
{
  "term": "test"
}

最后是我们的搜索:

GET testing-index-v3/_search
{
  "query": {
    "match": {
      "term": {
        "query": "[remi",
        "analyzer": "keyword", 
        "fuzziness": 0
      }
    }
  }
}

如您所见,我为自动完成过滤器选择了关键字标记器。我正在使用带有 min_gram: 1 和 max_gram 15 的 ngram 过滤器,这意味着我们的查询将被分成如下标记:

input-query=i, in, inp, inpu, input ..等。最多可分隔 15 个标记。这仅在索引时才需要。查看查询,我们还指定了关键字分析器 - 此分析器用于搜索时间,它与结果硬匹配。以下是一些示例搜索和结果:

GET testing-index-v3/_search
{
  "query": {
    "match": {
      "term": {
        "query": "[",
        "analyzer": "keyword", 
        "fuzziness": 0
      }
    }
  }
}

result:

    "hits" : [
          {
            "_index" : "testing-index-v3",
            "_type" : "_doc",
            "_id" : "w5c_IHsBGGZ-oIJIi-6n",
            "_score" : 0.7040055,
            "_source" : {
              "term" : "[remix]"
            }
          },
          {
            "_index" : "testing-index-v3",
            "_type" : "_doc",
            "_id" : "xJc_IHsBGGZ-oIJIju7m",
            "_score" : 0.7040055,
            "_source" : {
              "term" : "[+18]"
            }
          }
        ]
GET testing-index-v3/_search
{
  "query": {
    "match": {
      "term": {
        "query": "[+",
        "analyzer": "keyword", 
        "fuzziness": 0
      }
    }
  }
}

result:

    "hits" : [        
          {
            "_index" : "testing-index-v3",
            "_type" : "_doc",
            "_id" : "xJc_IHsBGGZ-oIJIju7m",
            "_score" : 0.7040055,
            "_source" : {
              "term" : "[+18]"
            }
          }
        ]

希望这个答案对您有所帮助。祝您在 elasticsearch 的冒险中好运!


推荐阅读