elasticsearch - Elasticsearch analyzer to remove quoted sentences
问题描述
I'm trying to create an analyzer that would remove (or replace by white/empty space) a quoted sentence within a document.
Such as: this is my \"test document\"
I'd like, for example, the term vector to be: [this, is, my]
解决方案
Daniel Answer 是正确的,但由于缺少相应的正则表达式和替换,我提供了它,其中包括对您的文本的测试。
使用模式替换字符的索引设置如下。
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"char_filter": [
"my_char_filter"
],
"filter": [
"lowercase"
]
}
},
"char_filter": {
"my_char_filter": {
"type": "pattern_replace",
"pattern": "\"(.*?)\"",
"replacement": ""
}
}
}
}
}
之后使用分析 API生成以下标记:
POST _analyze
{
"text": "this is my \"test document\"",
"analyzer" : "my_analyzer"
}
上述 API 的输出:
{
"tokens": [
{
"token": "this",
"start_offset": 0,
"end_offset": 4,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "is",
"start_offset": 5,
"end_offset": 7,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "my",
"start_offset": 8,
"end_offset": 10,
"type": "<ALPHANUM>",
"position": 2
}
]
}
推荐阅读
- javascript - 我的预加载器没有按预期工作,动画不起作用
- java - Jpa criteria api - 创建连接其他类的类
- python - 如何检查 Python 字符串是否包含字母、数字和 '_' 符号,但仅此而已?
- javascript - 如何在 VsCode 中调试 localhost 上的 javascript
- trino - Prestosql 集群上“解释”查询的延迟执行
- kik - 是否可以链接到kik组?
- gfortran - 使用 -fcheck=all 时不会警告未初始化的变量
- mysql - 如何使用mysql准备和执行语句调用带有输入和输出参数的存储过程?
- vb.net - vb.net 中的 Datagridview 到 Crystal Report
- python - 将numpy digitize函数翻译成c#