elasticsearch - 如何设计索引来执行我查看和查看的信息
问题描述
UserA views UserB
UserA views UserC
UserD views UserA
Who Viewed You queries:-
Who Viewed You should show UserD for UserA
Who Viewed You should show UserA for UserB
Who Viewed You should show UserA for UserC
Viewed By Me queries:-
Viewed By Me should show UserA for UserD
我们应该如何对users
索引进行建模,以获取上述信息
users index contains first_name, last_name, gender, ...
解决方案
我只会在访问者字段中保存一个数组(或根据较低的基数访问)
我猜这些文档可能很大,因此要优化(并避免大量更新),我将有一个只有日志的“visits_logs”索引和一个删除阶段较短的 LCM。(每天一个索引,删除前保留一周的数据)
{"visitor": "userA", "visited": "userB", "@timestamp": 12345678990}
然后在晚上,使用手动聚合的转换来填充每个时期的聚合索引:
PUT visits/_doc
{
"visitor": "UserA",
"@timestamp": "today",
"visited": {
"users": ["UserB", "UserC", "UserD"],
"quantity": 3
}
详细信息实际上取决于您的实际用例和数据量。但我认为这是一个强大的解决方案。
更新:查询将是:
如果你想知道 UserA 访问过的所有用户
GET test/_search
{
"query": {
"match": {
"visitor": "UserA"
}
}
}
响应将如下所示,您只需合并访问过的数组
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 0.4700036,
"hits" : [
{
"_index" : "test",
"_type" : "_doc",
"_id" : "5k-z3XQBDjdqjSSDl_K5",
"_score" : 0.4700036,
"_source" : {
"@timestamp" : "today",
"visited" : {
"users" : [
"UserB",
"UserC",
"UserD"
],
"quantity" : 3
},
"visitor" : "UserA"
}
},
{
"_index" : "test",
"_type" : "_doc",
"_id" : "Ksaz3XQBk-8NpR_boPe2",
"_score" : 0.4700036,
"_source" : {
"@timestamp" : "today",
"visited" : {
"users" : [
"UserB",
"UserC",
"UserD"
],
"quantity" : 3
},
"visitor" : "UserA"
}
}
]
}
}
如果你想得到“谁访问了 userB”
GET test/_search
{
"query": {
"match": {
"visited.users": "UserB"
}
},
"_source": ["@timestamp", "visitor"]
}
答案就是访客。
您可以通过聚合获得更合格的结果
GET test/_search
{
"size": 0,
"query": {
"match": {
"visited.users": "UserB"
}
},
"aggs": {
"visitors": {
"terms": {
"field": "visitor.keyword",
"size": 10
}
}
}
}
结果像
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"visitors" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "UserA",
"doc_count" : 2
}
]
}
}
}
并为访问
GET test/_search
{
"size": 0,
"query": {
"match": {
"visitor": "UserA"
}
},
"aggs": {
"visits": {
"terms": {
"field": "visited.users.keyword",
"size": 10
}
}
}
}
结果如下:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"visits" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "UserB",
"doc_count" : 2
},
{
"key" : "UserC",
"doc_count" : 2
},
{
"key" : "UserD",
"doc_count" : 2
}
]
}
}
}
推荐阅读
- c - 在节点生成中取消引用 NULL 指针警告
- python - 如何阻止 NAN 出现在 DataFrame 中
- javascript - 当点击按钮时自动发送到网站顶部
- kotlin - kotlin“删除”的可空参数
- javafx - 构造函数的 Java Wrapper 方法
- dart - 在 Dart 库中使用 webassembly (.wasm)
- c# - LINQ 从另一个实体列表中获取列表
- python - python根据给定日期打印出csv数据
- jquery - 如何使用 AJAX/jQuery 以 JSON 格式而不是带有键值对的 JSON POST 表单数据?
- c# - c# method type inference troubles