首页 > 解决方案 > MongoDB Aggregator - 将在 30 秒内发生的唯一项目分组

问题描述

我要解决的问题:我在 MongoDB 中有一组文档,它们代表用户进入和退出页面的时刻。我的目标是将这些分组为“会话”。

定义会话:会话是在 30 秒内发生的任何唯一文档块。如果它们具有相同uid的 、documentId和,则它们是唯一的clientType

目标是改变这个:

[
    {
        "_id": "1",
        "interactionType": "pageEnter",
        "uid": "u0",
        "documentId": "d0",
        "clientType": "web",
        "routePath": "/d0",
        "occurredAt": "2020-06-12T17:00:22.000Z"
    },
    {
        "_id": "2",
        "interactionType": "pageExit",
        "uid": "u0",
        "documentId": "d0",
        "clientType": "web",
        "routePath": "/d0",
        "occurredAt": "2020-06-12T17:00:32.000Z"
    },
    {
        "_id": "3",
        "interactionType": "pageEnter",
        "uid": "u0",
        "documentId": "d0",
        "clientType": "web",
        "routePath": "/d0/a",
        "occurredAt": "2020-06-12T17:00:42.000Z"
    },
    {
        "_id": "4",
        "interactionType": "pageExit",
        "uid": "u0",
        "documentId": "d0",
        "clientType": "web",
        "routePath": "/d0/a",
        "occurredAt": "2020-06-12T17:00:52.000Z"
    },
    {
        "_id": "5",
        "interactionType": "pageEnter",
        "uid": "u0",
        "documentId": "d0",
        "clientType": "web",
        "routePath": "/d0",
        "occurredAt": "2020-06-12T17:03:42.000Z"
    },
    {
        "_id": "6",
        "interactionType": "pageExit",
        "uid": "u0",
        "documentId": "d0",
        "clientType": "web",
        "routePath": "/d0",
        "occurredAt": "2020-06-12T17:03:52.000Z"
    }
]

进入这个:

[
    {
        "_id": "d0-u0-web-2020-06-12T17:00:42.000Z",
        "uid": "u0",
        "documentId": "d0",
        "lastViewedAt": "2020-06-12T17:00:42.000Z",
        "totalDurationMilli": 20000,
        "history": [
            {
                "routePath": "/d0",
                "clientType": "web",
                "totalDurationMilli": 10000
            },
            {
                "routePath": "/d0/a",
                "clientType": "web",
                "totalDurationMilli": 10000
            }
        ]
    },
    {
        "_id": "d0-u0-web-2020-06-12T17:03:42.000Z",
        "uid": "u0",
        "documentId": "d0",
        "lastViewedAt": "2020-06-12T17:03:42.000Z",
        "totalDurationMilli": 10000,
        "history": [
            {
                "routePath": "/d0",
                "clientType": "web",
                "totalDurationMilli": 10000
            },
        ]
    },
]

请注意,两个“会话”文档具有相同documentId但不同的历史记录组。这是因为,如前所述,我想以这样一种方式分离数据,即每个会话至少相隔 30 秒。

到目前为止,我的聚合器看起来像这样:

[
    // Filter by pageEnter and pageExit
    { $match: { interactionType: { $in: ['pageEnter', 'pageExit'] } } },

    // Sort by occurredAt
    { $sort: { occurredAt: 1 } },

    // Group by special id and and compose history.
    {
        $group: {
            _id: {
                uid: '$uid',
                documentId: '$documentId',
                clientType: '$clientType',
            },
            history: { $push: '$$ROOT' },
        },
    },

    // Project fields for final document.
    {
        $project: {
            _id: { $concat: ['$_id.documentId', '-', '$_id.uid', '-', '$_id.clientType', '-', { $arrayElemAt: ['$history.occurredAt', 0] }] },
            uid: { $arrayElemAt: ['$history.uid', 0] },
            documentId: { $arrayElemAt: ['$history.documentId', 0] },
            lastViewedAt: { $arrayElemAt: ['$history.occurredAt', 0] },
            totalDurationMilli: 'unknown',
            history: 1,
        },
    },
]

吐出这个(mongodb游乐场):

    {
        _id: "d0-u0-web-2020-06-12T17:00:22.000Z",
        uid: "u0",
        documentId: "d0",
        lastViewedAt: "2020-06-12T17:00:22.000Z",
        totalDurationMilli: 'unknown',
        history: [
            {
                "_id": "1",
                "interactionType": "pageEnter",
                "uid": "u0",
                "documentId": "d0",
                "clientType": "web",
                "routePath": "/d0",
                "occurredAt": "2020-06-12T17:00:22.560Z"
            },
            {
                "_id": "2",
                "interactionType": "pageExit",
                "uid": "u0",
                "documentId": "d0",
                "clientType": "web",
                "routePath": "/d0",
                "occurredAt": "2020-06-12T17:00:32.000Z"
            },
            {
                "_id": "3",
                "interactionType": "pageEnter",
                "uid": "u0",
                "documentId": "d0",
                "clientType": "web",
                "routePath": "/d0/a",
                "occurredAt": "2020-06-12T17:00:42.000Z"
            },
            {
                "_id": "4",
                "interactionType: "pageExit",
                "uid": "u0",
                "documentId": "d0",
                "clientType": "web",
                "routePath": "/d0/a",
                "occurredAt": "2020-06-12T17:00:52.000Z"
            },
            {
                "_id": "5",
                "interactionType": "pageEnter",
                "uid": "u0",
                "documentId": "d0",
                "clientType": "web",
                "routePath": "/d0",
                "occurredAt": "2020-06-12T17:03:42.000Z"
            },
            {
                "_id": "6",
                "interactionType": "pageExit",
                "uid": "u0",
                "documentId": "d0",
                "clientType": "web",
                "routePath": "/d0",
                "occurredAt": "2020-06-12T17:03:52.000Z"
            }
        ]
    }

我最大的问题是我无法弄清楚如何正确地对这些项目进行分组。我可以使用任何特定的帮手来解决这个问题吗?

标签: mongodbaggregation-framework

解决方案


推荐阅读