首页 > 解决方案 > 聚合查询从嵌套子文档中获取最新元素的计数

问题描述

我有一个包含许多用户的 mongodb 数据库,我跟踪的子文档之一是文件上传及其通过审查过程的状态。每个文件上传最终都会有一个附件状态。我希望能够提取一些指标来获取每个上传文件的当前状态总数。我开始构建一个聚合查询,从每个上传的文件中提取最新的附件子文档状态并计算它们。

数据结构如下:

 "userName": "johnDoe",
 "email": "johnDoe@gmail.com",
 "uploads" : [
                {
                        "_id" : ObjectId("adh12451e0012ce9da0"),
                        "fileName" : "TestDoc.txt",
                        "fileType" : "text/plain",
                        "created" : ISODate("2021-01-06T15:26:14.166Z"),
                        "attachmentStatus" : [ ]
                },
                {
                        "_id" : ObjectId("5ff5d6c066cacc0012ed655a"),
                        "fileName" : "testerABC.txt",
                        "fileType" : "text/plain",
                        "created" : ISODate("2021-01-06T15:26:56.027Z"),
                        "attachmentStatus" : [
                                {
                                        "_id" : ObjectId("60884f733f88bd00129b9ad4"),
                                        "status" : "Uploaded",
                                        "date" : ISODate("2021-04-22T02:23:00Z")
                                },
                                {
                                        "_id" : ObjectId("60884f733f88bd00129b9ad5"),
                                        "status" : "Processing",
                                        "date" : ISODate("2021-04-26T04:54:00Z")
                                }
                        ]
                },
                {
                        "_id" : ObjectId("6075c82a19fdcc0012f81907"),
                        "fileName" : "Test file.docx",
                        "fileType" : "application/word",
                        "created" : ISODate("2021-04-13T16:34:50.955Z"),
                        "attachmentStatus" : [
                                {
                                        "_id" : ObjectId("72844f733f88bd11479b9ad7"),
                                        "status" : "Uploaded",
                                        "date" : ISODate("2021-04-23T03:42:00Z")
                                },
                                {
                                        "_id" : ObjectId("724986d73f88bd00147c9wt8"),
                                        "status" : "Completed",
                                        "date" : ISODate("2021-04-24T01:37:00Z")
                                }
                        ]
                }
        ]

 "userName": "janeDoe",
 "email": "janeDoe@gmail.com",
 "uploads" : [
                {
                        "_id" : ObjectId("ej9784652h0012ce9da0"),
                        "fileName" : "myResume.txt",
                        "fileType" : "text/plain",
                        "created" : ISODate("2021-02-13T12:36:14.166Z"),
                        "attachmentStatus" : [
                                {
                                        "_id" : ObjectId("15dhdf6f88bd00147c9wt8"),
                                        "status" : "Completed",
                                        "date" : ISODate("2021-04-24T01:37:00Z")
                                }
                  ]
                }, 

如何为每个上传的文件提取最新的附件状态,然后总结状态?

我想要这样的东西:

{ "status" : "Completed", "Count" : 2 }
{ "status" : "Processing", "Count" : 1 }
...

我非常接近这个聚合查询,但它会抓取每一个状态,而不仅仅是每个文件的单个最新状态。(每个文件一个当前状态)。

db.myDB.aggregate([
{
  "$match" : {
    "uploads.attachmentStatus": {
      "$elemMatch": { "status": { "$exists": true } }
      }
    }
},
{ $unwind: "$uploads"},
{ $unwind: "$uploads.attachmentStatus"},
{
  $sortByCount: "$uploads.attachmentStatus.status"
},
{
  $project: {
    _id:0,
    status: "$_id",
    Count: "$count"
  }
}
]).pretty();

有什么建议么?

标签: mongodbgroup-bymongodb-queryaggregation-frameworksubdocument

解决方案


演示 - https://mongoplayground.net/p/zzOR9qhqny0


  • { $sort: { "uploads.attachmentStatus.date": -1 } },获取最新的 1st

  • { $group: { _id: "$uploads._id", status: { $first: "$uploads.attachmentStatus.status" } } }按uploads._id 对记录进行分组并取顶部状态(即按日期排序后的最新状态)。

询问

{ $sort: { "uploads.attachmentStatus.date": -1 } },
{ $group: { _id: "$uploads._id", status: { $first: "$uploads.attachmentStatus.status" } } },

完成查询

db.collection.aggregate([
  { $match: { "uploads.attachmentStatus": { "$elemMatch": {  "status": { "$exists": true } } } } },
  { $unwind: "$uploads" },
  { $unwind: "$uploads.attachmentStatus" },
  { $sort: { "uploads.attachmentStatus.date": -1 } },
  { $group: { _id: "$uploads._id", status: { $first: "$uploads.attachmentStatus.status" } } },
  { $sortByCount: "$status" },
  { $project: { _id: 0, status: "$_id", Count: "$count" } }
])

推荐阅读