elasticsearch - 尝试构建同义词过滤器时收到“无法构建同义词”消息
问题描述
我正在使用 Elasticsearch 6.8 和 python 3.7
我正在尝试创建自己的同义词,将表情符号称为文本。例如:“:-)”将指的是“快乐笑脸”。
我正在尝试使用以下代码构建和创建同义词和索引:
def create_analyzer(es_api, index_name, doc_type):
body = {
"settings": {
"index": {
"analysis": {
"filter": {
"synonym_filter": {
"type": "synonym",
"synonyms": [
":-), happy-smiley",
":-(, sad-smiley"
]
}
},
"analyzer": {
"synonym_analyzer": {
"tokenizer": "standard",
"filter": ["lowercase", "synonym_filter"]
}
}
}
}
},
"mappings": {
doc_type: {
"properties": {
"tweet": {"type": "text", "fielddata": "true"},
"existence": {"type": "text"},
"confidence": {"type": "float"}
}
}}
}
res = es_api.indices.create(index=index_name, body=body)
但我收到错误:
lasticsearch.exceptions.RequestError: RequestError(400, 'illegal_argument_exception', 'failed to build synonyms')
出了什么问题,我该如何解决?
解决方案
我可以说你出了什么问题,(更新)如何解决这个问题。
因此,如果您将在开发工具或 bu cURL 中运行此查询,您将看到错误原因 - 认为 Python 切割错误详细信息,因此您看不到原因。
PUT st_t3
{
"settings": {
"index": {
"analysis": {
"filter": {
"synonym_filter": {
"type": "synonym",
"synonyms": [
":-), happy-smiley",
":-(, sad-smiley"
]
}
},
"analyzer": {
"synonym_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"synonym_filter"
]
}
}
}
}
},
"mappings": {
"properties": {
"tweet": {
"type": "text",
"fielddata": "true"
},
"existence": {
"type": "text"
},
"confidence": {
"type": "float"
}
}
}
}
回复:
{
"error": {
"root_cause": [
{
"type": "remote_transport_exception",
"reason": "[127.0.0.1:9301][indices:admin/create]"
}
],
"type": "illegal_argument_exception",
"reason": "failed to build synonyms",
"caused_by": {
"type": "parse_exception",
"reason": "parse_exception: Invalid synonym rule at line 1",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "term: :-) was completely eliminated by analyzer"
}
}
},
"status": 400
}
所以原因"reason": "term: :-) was completely eliminated by analyzer"
- 意味着 Elastic 在同义词过滤器中不支持此字符。
更新
它可以通过char_filter
过滤器来完成。
例子:
PUT st_t3
{
"settings": {
"index": {
"analysis": {
"char_filter": {
"happy_filter": {
"type": "mapping",
"mappings": [
":-) => happy-smiley",
":-( => sad-smiley"
]
}
},
"analyzer": {
"smile_analyzer": {
"type": "custom",
"char_filter": [
"happy_filter"
],
"tokenizer": "standard",
"filter": [
"lowercase"
]
}
}
}
}
},
"mappings": {
"properties": {
"tweet": {
"type": "text",
"fielddata": "true"
},
"existence": {
"type": "text"
},
"confidence": {
"type": "float"
}
}
}
}
测试
POST st_t3/_analyze
{
"text": ":-) test",
"analyzer": "smile_analyzer"
}
回答
{
"tokens" : [
{
"token" : "happy",
"start_offset" : 0,
"end_offset" : 2,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "smiley",
"start_offset" : 2,
"end_offset" : 3,
"type" : "<ALPHANUM>",
"position" : 1
},
{
"token" : "test",
"start_offset" : 4,
"end_offset" : 8,
"type" : "<ALPHANUM>",
"position" : 2
}
]
}
推荐阅读
- jakarta-ee - 是否可以将开放自由与 jakarta 包命名空间一起使用?
- javascript - 连接到 mongo atlas uri 时出现 Mongo 网络错误
- typescript - ts 项目中意外的令牌导出 es6 依赖项
- r - 三列是否相同(不包括 NA)
- regex - 换行的正则表达式在 UiPath 中不起作用
- python - 为什么 tqdm 在我 pip 安装时也会导入 colorama。它使它变红
- c# - 在 UWP 后面的代码中编辑 DataTemplate 绑定
- sql - 具有相似记录的数据库使用 SQL 将它们排列为单个记录
- c# - 返回 min 和 max 之间的随机 BigInteger 的函数
- outlook - Outlook 加载项检测语言