首页 > 解决方案 > 如何设计索引来执行我查看和查看的信息

问题描述

UserA views UserB
UserA views UserC
UserD views UserA

Who Viewed You queries:-
Who Viewed You should show UserD for UserA
Who Viewed You should show UserA for UserB
Who Viewed You should show UserA for UserC

Viewed By Me queries:-
Viewed By Me should show UserA for UserD

我们应该如何对users索引进行建模,以获取上述信息

users index contains first_name, last_name, gender, ...

标签: elasticsearch

解决方案


我只会在访问者字段中保存一个数组(或根据较低的基数访问)

我猜这些文档可能很大,因此要优化(并避免大量更新),我将有一个只有日志的“visits_logs”索引和一个删除阶段较短的 LCM。(每天一个索引,删除前保留一周的数据)

{"visitor": "userA", "visited": "userB", "@timestamp": 12345678990}  

然后在晚上,使用手动聚合的转换来填充每个时期的聚合索引:

PUT visits/_doc
{
  "visitor": "UserA",
  "@timestamp": "today",
  "visited": {
      "users": ["UserB", "UserC", "UserD"],
      "quantity": 3
}

详细信息实际上取决于您的实际用例和数据量。但我认为这是一个强大的解决方案。

更新:查询将是:

如果你想知道 UserA 访问过的所有用户

GET test/_search
{
  "query": {
    "match": {
      "visitor": "UserA"
    }
  }
}

响应将如下所示,您只需合并访问过的数组

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 0.4700036,
    "hits" : [
      {
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "5k-z3XQBDjdqjSSDl_K5",
        "_score" : 0.4700036,
        "_source" : {
          "@timestamp" : "today",
          "visited" : {
            "users" : [
              "UserB",
              "UserC",
              "UserD"
            ],
            "quantity" : 3
          },
          "visitor" : "UserA"
        }
      },
      {
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "Ksaz3XQBk-8NpR_boPe2",
        "_score" : 0.4700036,
        "_source" : {
          "@timestamp" : "today",
          "visited" : {
            "users" : [
              "UserB",
              "UserC",
              "UserD"
            ],
            "quantity" : 3
          },
          "visitor" : "UserA"
        }
      }
    ]
  }
}

如果你想得到“谁访问了 userB”

GET test/_search
{
  "query": {
    "match": {
      "visited.users": "UserB"
    }
  },
  "_source": ["@timestamp", "visitor"]
}

答案就是访客。

您可以通过聚合获得更合格的结果

GET test/_search
{
  "size": 0, 
  "query": {
    "match": {
      "visited.users": "UserB"
    }
  },
  "aggs": {
    "visitors": {
      "terms": {
        "field": "visitor.keyword",
        "size": 10
      }
    }
  }
}

结果像

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "visitors" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "UserA",
          "doc_count" : 2
        }
      ]
    }
  }
}

并为访问

GET test/_search
{
  "size": 0, 
  "query": {
    "match": {
      "visitor": "UserA"
    }
  },
  "aggs": {
    "visits": {
      "terms": {
        "field": "visited.users.keyword",
        "size": 10
      }
    }
  }
}

结果如下:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "visits" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "UserB",
          "doc_count" : 2
        },
        {
          "key" : "UserC",
          "doc_count" : 2
        },
        {
          "key" : "UserD",
          "doc_count" : 2
        }
      ]
    }
  }
}

推荐阅读