首页 > 解决方案 > 在 ElasticSearch 中为多词同义词添加排除项

问题描述

我有以下同义词(仅用于此示例)

"synonyms": {
"type": "synonym_graph",
"expand": true,
"lenient": true,
"tokenizer": "standard",
"synonyms": [
    "french => french, ethnicity",
    "toast => toast, cheese sandwich"
]}

我想要实现的目标如下:

如果用户正在搜索“法语”,我希望他收到所有包含“法语”和/或“种族”的文档。

但是,如果用户搜索“法式吐司”,我希望他只接收包含“法式吐司”而不包含“种族吐司”的文档。

使用 _analyze api

GET test-xxx/_analyze
{
  "text": "french toast" ,
  "analyzer": "synonyms"
}

我得到以下

{
  "tokens" : [
    {
      "token" : "french",
      "start_offset" : 0,
      "end_offset" : 6,
      "type" : "SYNONYM",
      "position" : 0
    },
    {
      "token" : "ethnic",
      "start_offset" : 0,
      "end_offset" : 6,
      "type" : "SYNONYM",
      "position" : 0
    },
    {
      "token" : "toast",
      "start_offset" : 7,
      "end_offset" : 12,
      "type" : "SYNONYM",
      "position" : 1,
      "positionLength" : 2
    },
    {
      "token" : "chees",
      "start_offset" : 7,
      "end_offset" : 12,
      "type" : "SYNONYM",
      "position" : 1
    },
    {
      "token" : "sandwich",
      "start_offset" : 7,
      "end_offset" : 12,
      "type" : "SYNONYM",
      "position" : 2
    }
  ]
}

如果我将“法国吐司”添加为单个明确的同义词,它似乎忽略了“法国”和“吐司”同义词

"synonyms": {
    "type": "synonym_graph",
    "expand": true,
    "lenient": true,
    "tokenizer": "standard",
    "synonyms": [
        "french toast => french toast",
        "french => french, ethnicity",
        "toast => toast, cheese sandwich"
    ]}

导致

{
  "tokens" : [
    {
      "token" : "french",
      "start_offset" : 0,
      "end_offset" : 12,
      "type" : "SYNONYM",
      "position" : 0
    },
    {
      "token" : "toast",
      "start_offset" : 0,
      "end_offset" : 12,
      "type" : "SYNONYM",
      "position" : 1
    }
  ]
}

但它仍然返回包含“french”和/或“toast”的文档,而我只接收带有“french toast”的文档。

建议?

标签: elasticsearchelasticsearch-analyzers

解决方案


您使用的分词器是标准的。

标准分词器提供基于语法的分词

它会在 ["french" ,"toast","are", "delicious"] 中拆分“法式吐司很美味”

如果您在上面的文本中搜索“法式吐司”,您可以在匹配查询中将运算符作为“AND”传递,或者您使用 match_phrase

{
  "query": {
    "match_phrase": {
      "text": "french toast" --> phrase should match
    }
  }
}


{
  "query": {
    "match": {
      "text": {
        "query": "french toast",
        "operator": "and" --> both tokens must be present
      }
    }
  }
}

推荐阅读