首页 > 解决方案 > 在 Elasticsearch 中使用通配符进行不区分大小写的搜索

问题描述

我刚刚开始研究弹性搜索。我有一个索引“new_index”,其映射如下:

"new_index" : {
    "aliases" : { },
    "mappings" : {
      "current" : {
        "properties" : {
          "did" : {
            "type" : "integer"
          },
          "fil_date" : {
            "type" : "double"
          },
          "file_nr" : {
            "type" : "double"
          },
          "id" : {
            "type" : "integer"
          },
          "mark_text" : {
            "type" : "text"
          },
          "mark_type_id" : {
            "type" : "text"
          },
          "markdescr" : {
            "type" : "text"
          },
          "markdescrtext" : {
            "type" : "text"
          },
          "niceclmain" : {
            "type" : "double"
          },
          "owname" : {
            "type" : "text"
          },
          "statusapplication" : {
            "type" : "text"
          }
        }
      }
    },
    "settings" : {
      "index" : {
        "creation_date" : "1527665866982",
        "number_of_shards" : "5",
        "number_of_replicas" : "1",
        "uuid" : "Py5uWzVTRYqcZuCLcwm-BQ",
        "version" : {
          "created" : "6020499"
        },
        "provided_name" : "new_index"
      }
    }
  }

现在我想搜索字段“mark_text”。我有两种类型的搜索 1. 如果我搜索“智能”,结果应该只包含不区分大小写的“智能”单词。2. 它应该搜索,因为我们使用 LIKE "%smart%" 并且不区分大小写。

我有第二个搜索案例的查询。但是,我想知道是否有任何解决方案可以用于两种搜索情况。

编辑:我用于搜索案例 1 的查询是:

GET _search
{
  "query": {
    "bool": {
      "must" : [
        {
          "match": {
            "mark_text": "smart"
          }
        }  
      ]
    }
  }
}

查询搜索案例 2:

GET _search
{
  "query": {
    "bool": {
      "must" : [
        {
          "wildcard": {
            "mark_text": "*smart*"
          }
        }  
      ]
    }
  }
}

标签: elasticsearch

解决方案


我创建了一个新索引并添加了映射和设置,如下所示:

{
  "new_index5" : {
    "aliases" : { },
    "mappings" : {
      "current" : {
        "properties" : {
          "did" : {
            "type" : "integer"
          },
          "fil_date" : {
            "type" : "double"
          },
          "file_nr" : {
            "type" : "double"
          },
          "filing_date" : {
            "type" : "double"
          },
          "id" : {
            "type" : "integer"
          },
          "mark_identification" : {
            "type" : "keyword",
            "normalizer" : "lowercase_normalizer"
          },
          "mark_text" : {
            "type" : "keyword",
            "normalizer" : "lowercase_normalizer"
          },
          "mark_type_id" : {
            "type" : "text"
          },
          "markdescr" : {
            "type" : "text"
          },
          "markdescrtext" : {
            "type" : "text"
          },
          "niceclmain" : {
            "type" : "double"
          },
          "owname" : {
            "type" : "keyword",
            "normalizer" : "lowercase_normalizer"
          },
          "party_name" : {
            "type" : "keyword",
            "normalizer" : "lowercase_normalizer"
          },
          "primary_code" : {
            "type" : "text"
          },
          "registration_date" : {
            "type" : "double"
          },
          "registration_number" : {
            "type" : "double"
          },
          "serial_number" : {
            "type" : "double"
          },
          "status_code" : {
            "type" : "text"
          },
          "statusapplication" : {
            "type" : "text"
          }
        }
      }
    },
    "settings" : {
      "index" : {
        "number_of_shards" : "5",
        "provided_name" : "new_index5",
        "creation_date" : "1527686957833",
        "analysis" : {
          "normalizer" : {
            "lowercase_normalizer" : {
              "filter" : [
                "lowercase"
              ],
              "type" : "custom",
              "char_filter" : [ ]
            }
          }
        },
        "number_of_replicas" : "1",
        "uuid" : "9YdUrs1cSBuqDJmvSPOm6g",
        "version" : {
          "created" : "6020499"
        }
      }
    }
  }
} 

并在我的第一个搜索案例的查询中添加了聚合,如下所示:

GET _search
{
  "query": {
    "bool": {
      "must" : [
        {
          "match": {
              "mark_text": "smart"
          }
        }
      ]
    }
  },
  "aggs": {
    "mark_texts": {
      "terms": {
        "field": "mark_text"
      }
    }
  }
}

它给了我包括“聪明”和“聪明”的结果。

对于第二个搜索案例,我使用模糊。

我仍然不知道聚合和规范化器如何解决我的问题。但是,我正在努力理解它。


推荐阅读