elasticsearch - 根据字段字符串中的单词返回文档数
问题描述
如何在“word_combination”中返回“words”列表中超过 2 个元素且超过 3 个单词的文档数。有没有办法计算字符串中的单词数?
示例:如果(“words”的长度 > 2)AND(“words.word_combination”超过 3 个单词)返回文档
我存储了许多文件。一个文档的结构如下所示:
"_source" : {
"group_words" : [
{
"amount" : 1140,
"words" : [
{
"relevance_score" : 56,
"points" : 66461,
"bits" : 100,
"word_combination" : "cat dog"
},
{
"relevance_score" : 84,
"points" : 45202,
"bits" : 990,
"word_combination" : "cat dog elephant"
},
{
"relevance_score" : 99,
"points" : 30974,
"bits" : 70,
"word_combination" : "elephant cat mouse leopard"
}
],
"group" : "whatever"
},
{
"amount" : 1320,
"words" : [
{
"relevance_score" : 25,
"points" : 53396,
"bits" : 70,
"word_combination" : "lion elephant"
},
{
"relevance_score" : 66,
"points" : 52166,
"bits" : 20,
"word_combination" : "lion mouse fish cat dog"
},
{
"relevance_score" : 82,
"points" : 49316,
"bits" : 810,
"word_combination" : "elephant cat mouse leopard dog lion"
},
{
"relevance_score" : 87,
"points" : 127705,
"bits" : 290,
"word_combination" : "elephant cat mouse leopard tiger lion"
}
],
"group" : "whatever"
},
{
"amount" : 11260,
"words" : [
{
"relevance_score" : 0,
"points" : 37909,
"bits" : 9000,
"word_combination" : "elephant cat mouse leopard tiger lion monkey"
},
{
"relevance_score" : 3,
"points" : 35782,
"bits" : 540,
"word_combination" : "elephant"
}
],
"group" : "whatever"
}
]
}
解决方案
关于words
数组中元素的数量,我的建议是words_count
在索引时将该数字存储在一个附加字段中。
{
"amount" : 1140,
"words_count": 3, <--- add this
"words" : [
{
"relevance_score" : 56,
"points" : 66461,
"bits" : 100,
"word_combination" : "cat dog"
},
{
"relevance_score" : 84,
"points" : 45202,
"bits" : 990,
"word_combination" : "cat dog elephant"
},
{
"relevance_score" : 99,
"points" : 30974,
"bits" : 70,
"word_combination" : "elephant cat mouse leopard"
}
],
"group" : "whatever"
},
关于word_combination
字段中单词(或标记)的数量,有一种称为数据类型的数据类型token_count
正是为此目的而存在的。只需像这样定义您的映射:
...
"word_combination": {
"type": "text",
"fields": {
"count": {
"type": "token_count",
"analyzer": "standard"
}
}
}
然后在您的查询中,您可以访问word_combination.count
将包含word_combination
字段中存在的令牌数量(由指定的分析器分析)。
推荐阅读
- javascript - Angular - 方法不等待从 Firebase 获取值
- django - Django 使用不完整的模型创建有效的 ModelForm 以在表单验证后手动添加字段
- java - Eclipse 说 PrintStream 从未关闭,即使它已关闭
- javascript - Iterate through an object and grab the keys to reassign them
- c++ - 常量 std::shared_ptr
作为函数的参数最终会更改智能指针中类的值 - python - Python + Selenium:帮助寻找替代的 find_element_by
- swift - IBOutlet 未在 Swift 中连接
- javascript - 登录时重定向 - React.js
- c# - Is it possible to change the Visibility property of a TextBox with the click of a button in another class(another wpf window)?
- makefile - makefile 中的条件错误