elasticsearch - 使用俄语文本分析器搜索不起作用
问题描述
我有非常简单的 ElasticSearch 模型:
[ElasticsearchType(RelationName = "example")]
public class ElasticModel
{
[Text(Name = "description", Analyzer = "Russian", Index = true, SearchAnalyzer = "Russian")]
public string Description { get; set; }
}
然后我通过下一行初始化我的索引:
protected ICreateIndexRequest ConfigureIndex(CreateIndexDescriptor indexDescriptor,
Func<IndexSettingsDescriptor, IPromise<IIndexSettings>> selectorOfIndexSettings)
{
ICreateIndexRequest returnValue;
returnValue = indexDescriptor.Settings(selectorOfIndexSettings);
return returnValue;
}
await _client.Indices.CreateAsync(completeIndexName, indexDescriptor => ConfigureIndex(indexDescriptor, selector));
然后我用下一个值初始化我的模型并尝试搜索:
var document = new ElasticModel()
{
Description = "В Москве все выходные будут дожди"
};
var responseDoc = await await _client.IndexAsync(new IndexRequest<T>(document, completeIndexName))
var responseSearch = await _client.SearchAsync<ElasticModel>(s => s.Index(completeIndexName)
.Query(q => q.QueryString(c => c
.Query("выходной")
)));
但结果是空的。当我向我的 Elasticsearch 服务器发出下一个请求时:
POST {{ElasticSearchAddress}}/_analyze
{
"analyzer": "russian",
"text": "В Москве все выходные будут дожди"
}
我看到了预期的结果:
{
"tokens": [
{
"token": "москв",
"start_offset": 2,
"end_offset": 8,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "выходн",
"start_offset": 13,
"end_offset": 21,
"type": "<ALPHANUM>",
"position": 3
},
{
"token": "будут",
"start_offset": 22,
"end_offset": 27,
"type": "<ALPHANUM>",
"position": 4
},
{
"token": "дожд",
"start_offset": 28,
"end_offset": 33,
"type": "<ALPHANUM>",
"position": 5
}
]
}
谁能解释一下,为什么我从 C# 代码中的搜索不使用俄语分析器并且不返回我预期的结果?
更新:
向 /elastictest100/_search 请求正文:
{
"query": {
"multi_match" : {
"query": "выходные будут",
"fields": [ "description" ],
"analyzer": "russian"
}
}
}
还给我:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": 0.13353139,
"hits": [
{
"_index": "mediadev-elastictest100",
"_type": "_doc",
"_id": "G2FzRnMBhdWoY2X4fmQo",
"_score": 0.13353139,
"_source": {
"description": "В Москве все выходные будут дожди"
}
},
{
"_index": "mediadev-elastictest100",
"_type": "_doc",
"_id": "HGGLRnMBhdWoY2X4AGSV",
"_score": 0.13353139,
"_source": {
"description": "В Москве все выходные будут дожди"
}
},
{
"_index": "mediadev-elastictest100",
"_type": "_doc",
"_id": "HWGMRnMBhdWoY2X4tGSY",
"_score": 0.13353139,
"_source": {
"description": "В Москве все выходные будут дожди"
}
}
]
}
}
与身体:
{
"query": {
"multi_match" : {
"query": "выходной будет",
"fields": [ "description" ],
"analyzer": "russian"
}
}
}
还给我:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 0,
"relation": "eq"
},
"max_score": null,
"hits": []
}
}
解决方案
我不熟悉 Nest 代码,但可以为您提供一些调试问题的指示。
- 尝试打印最终搜索查询的 JSON,以便您可以使用 REST 搜索端点轻松对其进行测试,以比较您是否生成了正确的查询。
- 匹配查询使用与索引时间相同的分析器,但未分析术语查询,这会导致此类问题,并且最终要获得搜索结果,它应该将索引时间标记与搜索时间标记相匹配。
最容易检查搜索 JSON 并使用 ES REST 端点直接打击您的索引以查看根本原因。
推荐阅读
- c++ - 从串行存储和提取特定数据
- r - Kruskal-Wallis 测试:为子集 data.frame 创建 lapply 函数?
- scala - Akka HTTP 客户端 - 使用 Play JSON 解组
- r - 按组折叠数据框,使用变量列表进行加权平均和总和
- spring-boot - Kubernetes 上的内存缓存
- php - 在生成器函数中使用explode来迭代PHP中的长字符串
- docker - 无法在 Windows 容器中绑定挂载卷
- java - gradle jcenter 替代或镜像
- c# - ASP.NET MVC POST 不会发回第二次出现的变量
- scala - 具有 Reactivemongo 的 Id 的案例类,应该是可选的或必需的