mongodb - 如何在mongo中对后续元素进行分组?
问题描述
如何通过 mongo.xml 中的属性对后续元素进行分组。这是我的 mongo 文档(事件),我希望能够按类型将它们按顺序分组。
{
"_id" : ObjectId("5d1b68d708f3870049d9cc37"),
"controllerId" : "80058c2b-9525-4f7f-8e26-faea4ad92b15",
"id" : "76800453-e72e-410b-accc-cf47cd2773a1",
"type" : "controller_connection_status",
"timestamp" : 1562077399832.0,
}
/* 2 */
{
"_id" : ObjectId("5d1b68db08f3870049d9cc39"),
"controllerId" : "80058c2b-9525-4f7f-8e26-faea4ad92b15",
"id" : "fabd883c-6971-4977-b3fc-31679c2b85dd",
"type" : "controller_connection_status",
"timestamp" : 1562077402916.0,
}
/* 3 */
{
"_id" : ObjectId("5d1b68db08f3870049d9cc3a"),
"controllerId" : "80058c2b-9525-4f7f-8e26-faea4ad92b15",
"siteId" : "226168be-866c-11e8-adc0-fa7ae01bbebc",
"id" : "98decbea-8288-4df5-807d-14e90f929df2",
"type" : "controller_added",
"timestamp" : 1562077402920.0,
}
/* 4 */
{
"_id" : ObjectId("5d1b690908f3870049d9cc3c"),
"controllerId" : "80058c2b-9525-4f7f-8e26-faea4ad92b15",
"id" : "b5ad6199-8805-43fd-bd7e-80f0410e744a",
"type" : "controller_connection_status",
"timestamp" : 1562077449904.0,
}
/* 5 */
{
"_id" : ObjectId("5d1b690d08f3870049d9cc3d"),
"controllerId" : "80058c2b-9525-4f7f-8e26-faea4ad92b15",
"id" : "276ea325-0eec-47a2-8e0e-3805ed34b80b",
"type" : "controller_error",
"timestamp" : 1562077452975.0,
}
/* 6 */
{
"_id" : ObjectId("5d1b694508f3870049d9cc3f"),
"controllerId" : "80058c2b-9525-4f7f-8e26-faea4ad92b15",
"id" : "03ce803b-6b2e-49fe-8f0d-4feee44251e9",
"type" : "controller_error",
"timestamp" : 1562077509904.0,
}
/* 7 */
{
"_id" : ObjectId("5d1b694908f3870049d9cc41"),
"controllerId" : "80058c2b-9525-4f7f-8e26-faea4ad92b15",
"id" : "b144a04f-8201-4945-b2c4-faef5b41866e",
"type" : "controller_connection_status",
"timestamp" : 1562077512974.0,
}
/* 8 */
{
"_id" : ObjectId("5d1b698208f3870049d9cc42"),
"controllerId" : "80058c2b-9525-4f7f-8e26-faea4ad92b15",
"id" : "235874f3-c017-4ea8-abaf-8c5edf1b317a",
"type" : "controller_connection_status",
"timestamp" : 1562077569903.0,
}
/* 9 */
{
"_id" : ObjectId("5d1b698508f3870049d9cc43"),
"controllerId" : "80058c2b-9525-4f7f-8e26-faea4ad92b15",
"id" : "4fb3706f-d195-4ded-87b9-8482c712825c",
"type" : "controller_connection_status",
"timestamp" : 1562077572973.0,
"createdAt" : ISODate("2019-07-02T14:26:13.120Z"),
"updatedAt" : ISODate("2019-07-02T14:26:13.120Z"),
"__v" : 0
}
/* 10 */
{
"_id" : ObjectId("5d1b69bd08f3870049d9cc45"),
"controllerId" : "80058c2b-9525-4f7f-8e26-faea4ad92b15",
"id" : "e743baef-1701-436a-baf2-8367a0917c81",
"type" : "controller_removed",
"timestamp" : 1562077629903.0,
}
我想要的输出:
timestamp type count
(last timestamp) controller_connection_status 2
-- controller_added 1
-- controller_connection_status 1
-- controller_error 2
-- controller_connection_status 3
-- controller_removed 1
到目前为止我已经尝试过:
db.getCollection('events').aggregate([
{
'$match': {
'controllerId': '80058c2b-9525-4f7f-8e26-faea4ad92b15'
}
},
{
'$group': {
'_id': '$type',
'type': {
'$first': '$type'
},
'timestamp': {
'$last': '$timestamp'
},
'count': {
'$sum': 1,
}
}
},
{
'$sort': {
'timestamp': -1
}
}
])
我的输出:
timestamp type count
(last timestamp) controller_connection_status 6
-- controller_added 1
-- controller_error 2
-- controller_removed 1
解决方案
您可以使用$graphLookup将顺序文档分组到数组中。它需要一个集合来查找,在你的情况下它可以是一个视图。
该视图使用$zip运算符聚合前后对中的文档:
db.createView("events-view", "original_collection", [
{ $sort: { timestamp: 1 } },
{ $group: { _id: null, docs: { $push: "$$ROOT" } } },
{ $project: {
pair: { $zip: {
inputs:[ { $concatArrays: [ [false], "$docs" ]} , "$docs" ]
} }
} },
{ $unwind: "$pair" },
{ $project: {
prev: { $arrayElemAt: [ "$pair", 0 ] },
next: { $arrayElemAt: [ "$pair", 1 ] }
} },
{ $project: {
_id: "$prev._id",
prev: 1,
next: 1,
sameType: { $eq: ["$prev.type", "$next.type"] }
} },
]);
它应该如下所示:
{
"prev" : false,
"next" : {
"_id" : ObjectId("5d1b68d708f3870049d9cc37"),
"controllerId" : "80058c2b-9525-4f7f-8e26-faea4ad92b15",
"id" : "76800453-e72e-410b-accc-cf47cd2773a1",
"type" : "controller_connection_status",
"timestamp" : 1562077399832.0
},
"sameType" : false
},
{
"prev" : {
"_id" : ObjectId("5d1b68d708f3870049d9cc37"),
"controllerId" : "80058c2b-9525-4f7f-8e26-faea4ad92b15",
"id" : "76800453-e72e-410b-accc-cf47cd2773a1",
"type" : "controller_connection_status",
"timestamp" : 1562077399832.0
},
"next" : {
"_id" : ObjectId("5d1b68db08f3870049d9cc39"),
"controllerId" : "80058c2b-9525-4f7f-8e26-faea4ad92b15",
"id" : "fabd883c-6971-4977-b3fc-31679c2b85dd",
"type" : "controller_connection_status",
"timestamp" : 1562077402916.0
},
"_id" : ObjectId("5d1b68d708f3870049d9cc37"),
"sameType" : true
},
{
"prev" : {
"_id" : ObjectId("5d1b68db08f3870049d9cc39"),
"controllerId" : "80058c2b-9525-4f7f-8e26-faea4ad92b15",
"id" : "fabd883c-6971-4977-b3fc-31679c2b85dd",
"type" : "controller_connection_status",
"timestamp" : 1562077402916.0
},
"next" : {
"_id" : ObjectId("5d1b68db08f3870049d9cc3a"),
"controllerId" : "80058c2b-9525-4f7f-8e26-faea4ad92b15",
"siteId" : "226168be-866c-11e8-adc0-fa7ae01bbebc",
"id" : "98decbea-8288-4df5-807d-14e90f929df2",
"type" : "controller_added",
"timestamp" : 1562077402920.0
},
"_id" : ObjectId("5d1b68db08f3870049d9cc39"),
"sameType" : false
},
etc...
然后您可以按类型和最新时间戳查询视图分组文档,直到“sameType”条件成立。最长的文件链是您要查找的计数:
db.getCollection("events-view").aggregate([
{ $graphLookup: {
from: "events-view",
startWith: "$next._id",
connectFromField: "next._id",
connectToField: "_id",
restrictSearchWithMatch: { "sameType": true },
as: "chain"
} },
{ $project: {
_id: "$next._id",
type: "$next.type",
chain: { $concatArrays: [ [{ next: "$next" }], "$chain" ] }
} },
{ $addFields: {
chainLength: { $size: "$chain" },
timestamp: { $max: { $map: {
input: "$chain",
in: "$$this.next.timestamp"
} } }
} },
{ $group: {
_id: {type: "$type", timestamp: "$timestamp"},
count: {$max: "$chainLength"}
} },
{ $sort: { "_id.timestamp": 1 } },
{ $project: {
_id: 0,
timestamp: "$_id.timestamp",
type: "$_id.type",
count: 1
} }
])
应该提到的是查询会很慢。链越长性能越差。还要记住 $graphLookup 阶段必须保持在 100 兆字节的内存限制内。对于较大的集合,您应该将allowDiskUse
选项设置为true
.
推荐阅读
- git - GitHub PR squash 导致错误的合并冲突
- rabbitmq - RabbitMq:具有直接绑定的消费者的动态数量
- oracle - 为什么在Oracle的FROM子句中更改表的顺序会导致尝试JOIN时出错
- python - html中的Python货币样式问题
- solace - 请求/回复休息
- javascript - 如何在 asp.net 应用程序中存储全局 Javascript 变量
- oracle-apex - 需要自定义oracle apex标准区域
- python-3.x - 在python3中将熊猫数据框保存到.mat文件
- javascript - 控制台中的 Firebase 无效 api 密钥错误
- python - 使用日期时间减去一天正在更改月份