首页 > 解决方案 > 尝试构建同义词过滤器时收到“无法构建同义词”消息

问题描述

我正在使用 Elasticsearch 6.8 和 python 3.7

我正在尝试创建自己的同义词,将表情符号称为文本。例如:“:-)”将指的是“快乐笑脸”。

我正在尝试使用以下代码构建和创建同义词和索引:

def create_analyzer(es_api, index_name, doc_type):
    body = {
        "settings": {
                "index": {
                    "analysis": {
                        "filter": {
                            "synonym_filter": {
                                "type": "synonym",
                                "synonyms": [
                                    ":-), happy-smiley",
                                    ":-(, sad-smiley"
                                ]
                            }
                        },
                        "analyzer": {
                            "synonym_analyzer": {
                                "tokenizer": "standard",
                                "filter": ["lowercase", "synonym_filter"]
                            }
                        }
                    }
                }
            },
        "mappings": {
            doc_type: {
                "properties": {
                    "tweet": {"type": "text", "fielddata": "true"},
                    "existence": {"type": "text"},
                    "confidence": {"type": "float"}
                }
            }}
    }
    res = es_api.indices.create(index=index_name, body=body)

但我收到错误:

lasticsearch.exceptions.RequestError: RequestError(400, 'illegal_argument_exception', 'failed to build synonyms')

出了什么问题,我该如何解决?

标签: elasticsearch

解决方案


我可以说你出了什么问题,(更新)如何解决这个问题。

因此,如果您将在开发工具或 bu cURL 中运行此查询,您将看到错误原因 - 认为 Python 切割错误详细信息,因此您看不到原因。

PUT st_t3
{
  "settings": {
    "index": {
      "analysis": {
        "filter": {
          "synonym_filter": {
            "type": "synonym",
            "synonyms": [
              ":-), happy-smiley",
              ":-(, sad-smiley"
            ]
          }
        },
        "analyzer": {
          "synonym_analyzer": {
            "tokenizer": "standard",
            "filter": [
              "lowercase",
              "synonym_filter"
            ]
          }
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "tweet": {
        "type": "text",
        "fielddata": "true"
      },
      "existence": {
        "type": "text"
      },
      "confidence": {
        "type": "float"
      }
    }
  }
}

回复:

{
  "error": {
    "root_cause": [
      {
        "type": "remote_transport_exception",
        "reason": "[127.0.0.1:9301][indices:admin/create]"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "failed to build synonyms",
    "caused_by": {
      "type": "parse_exception",
      "reason": "parse_exception: Invalid synonym rule at line 1",
      "caused_by": {
        "type": "illegal_argument_exception",
        "reason": "term: :-) was completely eliminated by analyzer"
      }
    }
  },
  "status": 400
}

所以原因"reason": "term: :-) was completely eliminated by analyzer"- 意味着 Elastic 在同义词过滤器中不支持此字符。

更新

它可以通过char_filter过滤器来完成。

例子:

PUT st_t3
{
  "settings": {
    "index": {
      "analysis": {
        "char_filter": {
          "happy_filter": {
            "type": "mapping",
            "mappings": [
              ":-) => happy-smiley",
              ":-( => sad-smiley"
            ]
          }
        },
        "analyzer": {
          "smile_analyzer": {
            "type": "custom",
            "char_filter": [
              "happy_filter"
            ],
            "tokenizer": "standard",
            "filter": [
              "lowercase"
            ]
          }
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "tweet": {
        "type": "text",
        "fielddata": "true"
      },
      "existence": {
        "type": "text"
      },
      "confidence": {
        "type": "float"
      }
    }
  }
}

测试

POST st_t3/_analyze
{
  "text": ":-) test",
  "analyzer": "smile_analyzer"
}

回答

{
  "tokens" : [
    {
      "token" : "happy",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "smiley",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "test",
      "start_offset" : 4,
      "end_offset" : 8,
      "type" : "<ALPHANUM>",
      "position" : 2
    }
  ]
}

推荐阅读