首页 > 解决方案 > 如何在 Elasticsearch Normalizer 中修剪所有空格

问题描述

我发现带有trim过滤器的规范化器没有修剪所有空白字符,例如\u2007没有修剪。有没有办法在规范器中修剪所有空白字符?我试图将模式替换字符过滤器附加到规范器,但这似乎不受支持 - https://github.com/elastic/elasticsearch/issues/28605

标签: elasticsearch

解决方案


添加带有索引数据、映射、搜索查询和搜索结果的工作示例

索引映射:

{
  "settings": {
    "analysis": {
      "normalizer": {
        "my_normalizer": {
          "type": "custom",
          "filter": [
            "lowercase",
            "trim"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "foo": {
        "type": "keyword",
        "normalizer": "my_normalizer"
      }
    }
  }
}

分析API:

GET /_analyze
{
  "normalizer" : "my_normalizer",
  "text" : " Hello"
}

生成的令牌是

{
  "tokens": [
    {
      "token": "hello",
      "start_offset": 0,
      "end_offset": 6,
      "type": "word",
      "position": 0
    }
  ]
}

指数数据:

{
    "foo":" HellO "
}
{
    "foo":"hello "
}
{
    "foo":"hellO"
}

搜索查询:

{
  "query": {
    "term": {
      "foo": "hello"
    }
  }
}

搜索结果:

"hits": [
      {
        "_index": "67331196",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.105360515,
        "_source": {
          "foo": " hello"
        }
      },
      {
        "_index": "67331196",
        "_type": "_doc",
        "_id": "2",
        "_score": 0.105360515,
        "_source": {
          "foo": " Hello"
        }
      },
      {
        "_index": "67331196",
        "_type": "_doc",
        "_id": "3",
        "_score": 0.105360515,
        "_source": {
          "foo": " HellO "
        }
      }
    ]

推荐阅读