java - elasticsearch按特殊字符搜索
问题描述
我有一组以下短语:[remix]、[18+] 等。如何通过一个字符(例如“[”)进行搜索以找到所有这些变体?现在我有以下分析器配置:
{
"analysis": {
"analyzer": {
{ "bigram_analyzer": {
{ "type": "custom",
{ "tokenizer": { "keyword",
{ "filter": [
{ "lowercase",
"bigram_filter".
]
},
{ "full_text_analyzer": {
{ "type": "custom",
{ "tokenizer": { "ngram_tokenizer",
{ "filter": [
"lowercase"
]
}
},
{ "filter": {
{ "bigram_filter": {
{ "type": "edge_ngram",
{ "max_gram": 2
}
},
{ "tokenizer": {
{ "ngram_tokenizer": {
{ "type": "ngram",
{ "min_gram": 3,
{ "max_gram": 3,
{ "token_chars": [
{ "letter",
{ "digit",
{ "symbol",
"punctuation"
]
}
}
}
}
使用spring boot data elasticsearch starter在java实体级别进行映射
解决方案
如果我正确理解您的问题 - 您想要实现一个自动完成分析器,它将返回任何以[
或任何其他字符开头的术语。为此,您可以使用 ngram 自动完成创建自定义分析器。这是一个例子:
以下是测试指标:
PUT /testing-index-v3
{
"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 15
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase",
"autocomplete_filter"
]
}
}
}
},
"mappings": {
"properties": {
"term": {
"type": "text",
"analyzer": "autocomplete"
}
}
}
}
这是文件输入:
POST /testing-index-v3/_doc
{
"term": "[+18]"
}
POST testing-index-v3/_doc
{
"term": "[remix]"
}
POST testing-index-v3/_doc
{
"term": "test"
}
最后是我们的搜索:
GET testing-index-v3/_search
{
"query": {
"match": {
"term": {
"query": "[remi",
"analyzer": "keyword",
"fuzziness": 0
}
}
}
}
如您所见,我为自动完成过滤器选择了关键字标记器。我正在使用带有 min_gram: 1 和 max_gram 15 的 ngram 过滤器,这意味着我们的查询将被分成如下标记:
input-query
=i, in, inp, inpu, input ..
等。最多可分隔 15 个标记。这仅在索引时才需要。查看查询,我们还指定了关键字分析器 - 此分析器用于搜索时间,它与结果硬匹配。以下是一些示例搜索和结果:
GET testing-index-v3/_search
{
"query": {
"match": {
"term": {
"query": "[",
"analyzer": "keyword",
"fuzziness": 0
}
}
}
}
result:
"hits" : [
{
"_index" : "testing-index-v3",
"_type" : "_doc",
"_id" : "w5c_IHsBGGZ-oIJIi-6n",
"_score" : 0.7040055,
"_source" : {
"term" : "[remix]"
}
},
{
"_index" : "testing-index-v3",
"_type" : "_doc",
"_id" : "xJc_IHsBGGZ-oIJIju7m",
"_score" : 0.7040055,
"_source" : {
"term" : "[+18]"
}
}
]
GET testing-index-v3/_search
{
"query": {
"match": {
"term": {
"query": "[+",
"analyzer": "keyword",
"fuzziness": 0
}
}
}
}
result:
"hits" : [
{
"_index" : "testing-index-v3",
"_type" : "_doc",
"_id" : "xJc_IHsBGGZ-oIJIju7m",
"_score" : 0.7040055,
"_source" : {
"term" : "[+18]"
}
}
]
希望这个答案对您有所帮助。祝您在 elasticsearch 的冒险中好运!