首页 > 解决方案 > 如何为 mongodb 编写相关的 $lookup 查询?

问题描述

有一些推文数据,例如:

    {
        "id": 1432568863334539264,
        "created_at": "2021-08-31 05:00:07",
        "text": "Most of New Orleans went dark on Sunday after Hurricane Ida took out eight transmission lines. A new natural gas po… ",
        "user_id": 807095
    },
{
        "id": 1432578972844531714,
        "created_at": "2021-08-31 05:40:17",
        "text": "RT @nytimes: Most of New Orleans went dark on Sunday after Hurricane Ida took out eight transmission lines. A new natural gas power plant,…",
        "user_id": 1414848022849245186,
        "retweet_id": 1432568863334539264,
        "retweet_user_id": 807095,
        "user_mentions": [
            {
                "id": 807095,
                "indices": [
                    3,
                    11
                ]
            }
        ]
    },
{
        "id": 1432578589107625990,
        "created_at": "2021-08-31 05:38:46",
        "text": "RT @nytimes: Most of New Orleans went dark on Sunday after Hurricane Ida took out eight transmission lines. A new natural gas power plant,…",
        "user_id": 1345847262010875915,
        "retweet_id": 1432568863334539264,
        "retweet_user_id": 807095,
        "user_mentions": [
            {
                "id": 807095,
                "indices": [
                    3,
                    11
                ]
            }
        ]
    }

我想使用单个查找或聚合来解决:

查找在发布后的第一个小时内收到最多转发的一般推文。打印出推文 ID 和它在第一个小时内收到的转发次数。

我没有工作的脚本:

db.tweets_hurricane.aggregate([
    {$match: {retweet_id:{$exists:true}}},
    {$project: {_id: 0, id: '$id', retweet_id: '$retweet_id', format_date: { $toDate: '$created_at'}, format_date_end: { $add: [{$toDate: '$created_at'}, 1*60*60*1000]}}},
    {$group: {_id: '$retweet_id',  count: {$sum: {$switch: {branches: [{case: {$lt: ['$format_date', db.tweets_hurricane.findOne({id: '$retweet_id'}, {format_date_end:1})]}, then: 1}], default: 0}}}}},
    {$sort: {count: -1}}
    
])

标签: mongodb

解决方案


示例数据使解决方案难以理解,但它应该可以工作:

db.tweets_hurricane.aggregate([
   // Join tweets and re-tweets
   { $match: { retweet_id: { $exists: false } } },
   {
      $lookup:
         {
            from: "tweets_hurricane",
            localField: "id",
            foreignField: "retweet_id",
            as: "retweets"
         }
   },
   // filter retweets created in the first hour after it is published
   {
      $set: {
         retweets: {
            $filter: {
               input: "$retweets",
               as: "tweet",
               cond: { $lt: ["$$tweet.created_at", { $add: ["$created_at", 1000 * 60 * 60] }] }
            }
         }
      }
   },
   // count number of retweets
   { $set: { retweet_count: { $size: "$retweets" } } },
   // limit to most retweeted tweet
   { $sort: { retweet_count: -1 } },
   { $limit: 1 },
   // finalize output
   {
      $project: {
         id: 1,
         retweet_count: 1
      }
   }
])

推荐阅读