首页 > 解决方案 > 如何使用elasticsearch-dsl在所有索引中找到数组中的不同值?

问题描述

我在 django 中使用 elasticsearch-dsl。我定义了一个 DocType 文档和一个包含值列表的关键字。

这是我的代码。

from elasticsearch_dsl import DocType, Text, Keyword

class ProductIndex(DocType):
    """
    Index for products
    """
    id = Keyword()
    slug = Keyword()
    name = Text()
    filter_list = Keyword()

filter_list 是这里包含多个值的数组。现在我有一些值,比如 sample_filter_list,它们是不同的值,其中一些元素可以存在于某些产品的 filter_list 数组中。因此,给定这个 sample_filter_list,我想要 filter_list 与 sample_filter_list 交集不为空的所有产品的所有唯一元素。

for example I have 5 products whose filter_list is like :
1) ['a', 'b', 'c']
2) ['d', 'e', 'f']
3) ['g', 'h', 'i']
4) ['j', 'k', 'l']
5) ['m', 'n', 'o']
and if my sample filter_list is ['a', 'd', 'g', 'j', 'm']
then elasticsearch should return an array containg distinct element 
i.e. ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o']

标签: pythonelasticsearchelasticsearch-dsl

解决方案


            Writing Answer not specific to django but general,
            Suppose you have some ES index some_index2 with mapping

            PUT some_index2
            {
              "mappings": {
                "some_type": {
                  "dynamic_templates": [
                    {
                      "strings": {
                        "mapping": {
                          "type": "string"
                        },
                        "match_mapping_type": "string"
                      }
                    }
                  ],
                  "properties": {
                    "field1": {
                      "type": "string"
                    },
                    "field2": {
                      "type": "string"
                    }
                  }
                }
              }
            }

        Also you have inserted the documents 
        {
            "field1":"id1",
            "field2":["a","b","c","d]
        }
        {
            "field1":"id2",
            "field2":["e","f","g"]
        }
        {
            "field1":"id3",
            "field2":["e","l","k"]
        }

    Now as you stated you want all the distinct values of field2(filter_list) in your case, You can easily get that by using ElasticSearch term aggregation

    GET some_index2/_search
    {
    "aggs": {
      "some_name": {
        "terms": {
          "field": "field2",
          "size": 10000
        }
      }
    },
    "size": 0
    }

    Which will give you result as:

    {
      "took": 2,
      "timed_out": false,
      "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
      },
      "hits": {
        "total": 3,
        "max_score": 0,
        "hits": []
      },
      "aggregations": {
        "some_name": {
          "doc_count_error_upper_bound": 0,
          "sum_other_doc_count": 0,
          "buckets": [
            {
              "key": "e",
              "doc_count": 2
            },
            {
              "key": "a",
              "doc_count": 1
            },
            {
              "key": "b",
              "doc_count": 1
            },
            {
              "key": "c",
              "doc_count": 1
            },
            {
              "key": "d",
              "doc_count": 1
            },
            {
              "key": "f",
              "doc_count": 1
            },
            {
              "key": "g",
              "doc_count": 1
            },
            {
              "key": "k",
              "doc_count": 1
            },
            {
              "key": "l",
              "doc_count": 1
            }
          ]
        }
      }
    }

    where buckets contains the list of all the distinct values.
    you can easily iterate through bucket and find the value under KEY.

Hope this is what is required to you.

推荐阅读