首页 > 解决方案 > 分析器在 Elasticsearch 中忽略重音和复数单数

问题描述

当我进行搜索查询时,我正在努力忽略重音和复数/单数。我从这里复制了西班牙语分析器,只留下了词干分析器https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-lang-analyzer.html

您可以在 Python 中检查我的代码(我从 CSV 后者批量处理数据):

settings={
  "settings": {
    "analysis": {
      "filter": {
        "spanish_stemmer": {
          "type":       "stemmer",
          "language":   "light_spanish"
        }
      },
      "analyzer": {
        "rebuilt_spanish": {
          "tokenizer":  "standard",
          "filter": [
            "lowercase",
            "spanish_stemmer"
          ]
        }
      }
    }
  }
}
    
es.indices.create(index="activities", body=settings)

但是,当我尝试从失眠症中进行 GET 查询时geometricogeométrico我得到 0 个结果geométricosgeometricos并且有一个带有 Title 的文档Cuerpos geométricos。它应该匹配,因为我想对重音和复数单数没有区别。有任何想法吗?

我做的 GET 查询:

{
    "query": {
        "function_score": {
            "query": {
                "multi_match": {
                    "query": "geométricos",
                    "fields": [
                        "Descripcion",
                        "Nombre",
                        "Tags"
                    ],
                 "analyzer":"rebuilt_spanish"
                }
            }
        }
    }
}

标签: pythonelasticsearchelasticsearch-py

解决方案


您需要在此处ASCII folding token filter添加到您的令牌过滤器检查官方文档。所以你应该是这样的:Analyzer

分析者:

"analysis": {
      "filter": {
        "spanish_stemmer": {
          "type":       "stemmer",
          "language":   "light_spanish"
        }
      },
      "analyzer": {
        "rebuilt_spanish": {
          "tokenizer":  "standard",
          "filter": [
            "asciifolding", // ASCII folding token filter
            "lowercase",
            "spanish_stemmer"
          ]
        }
      }
    }
  }

推荐阅读