python - 如何使用elasticsearch-dsl在所有索引中找到数组中的不同值?
问题描述
我在 django 中使用 elasticsearch-dsl。我定义了一个 DocType 文档和一个包含值列表的关键字。
这是我的代码。
from elasticsearch_dsl import DocType, Text, Keyword
class ProductIndex(DocType):
"""
Index for products
"""
id = Keyword()
slug = Keyword()
name = Text()
filter_list = Keyword()
filter_list 是这里包含多个值的数组。现在我有一些值,比如 sample_filter_list,它们是不同的值,其中一些元素可以存在于某些产品的 filter_list 数组中。因此,给定这个 sample_filter_list,我想要 filter_list 与 sample_filter_list 交集不为空的所有产品的所有唯一元素。
for example I have 5 products whose filter_list is like :
1) ['a', 'b', 'c']
2) ['d', 'e', 'f']
3) ['g', 'h', 'i']
4) ['j', 'k', 'l']
5) ['m', 'n', 'o']
and if my sample filter_list is ['a', 'd', 'g', 'j', 'm']
then elasticsearch should return an array containg distinct element
i.e. ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o']
解决方案
Writing Answer not specific to django but general,
Suppose you have some ES index some_index2 with mapping
PUT some_index2
{
"mappings": {
"some_type": {
"dynamic_templates": [
{
"strings": {
"mapping": {
"type": "string"
},
"match_mapping_type": "string"
}
}
],
"properties": {
"field1": {
"type": "string"
},
"field2": {
"type": "string"
}
}
}
}
}
Also you have inserted the documents
{
"field1":"id1",
"field2":["a","b","c","d]
}
{
"field1":"id2",
"field2":["e","f","g"]
}
{
"field1":"id3",
"field2":["e","l","k"]
}
Now as you stated you want all the distinct values of field2(filter_list) in your case, You can easily get that by using ElasticSearch term aggregation
GET some_index2/_search
{
"aggs": {
"some_name": {
"terms": {
"field": "field2",
"size": 10000
}
}
},
"size": 0
}
Which will give you result as:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0,
"hits": []
},
"aggregations": {
"some_name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "e",
"doc_count": 2
},
{
"key": "a",
"doc_count": 1
},
{
"key": "b",
"doc_count": 1
},
{
"key": "c",
"doc_count": 1
},
{
"key": "d",
"doc_count": 1
},
{
"key": "f",
"doc_count": 1
},
{
"key": "g",
"doc_count": 1
},
{
"key": "k",
"doc_count": 1
},
{
"key": "l",
"doc_count": 1
}
]
}
}
}
where buckets contains the list of all the distinct values.
you can easily iterate through bucket and find the value under KEY.
Hope this is what is required to you.
推荐阅读
- reactjs - 更新按钮值
- javascript - 使用前一个数组的内容更新对象数组中的下一个数组
- docker - docker 容器中的防病毒软件 - fanotify 在主机和容器之间工作吗?
- python - Python 关闭失败的节目。Pytest 如何静默失败。抑制断言的回溯
- excel - 使用条件格式将导出文件从 access 格式化为 excel
- laravel - Laravel:检测控制器中的令牌不匹配
- google-cloud-platform - 在 Cloud SQL 中删除数据库的云函数
- javascript - 如何在 React 中修复 Framer Motion 动画?
- c++ - C++ 中的格式说明符
- db2 - 将 db2 数据库从一个实例复制到另一个实例