首页 > 解决方案 > MongoDB 聚合在索引字段上很慢

问题描述

我有一个包含约 2.5m 文档的集合,集合大小为14,1GB、存储大小4.2GB和平均对象大小5,8KBdataSourceName我在两个字段和(文本字段)上创建了两个单独的索引,version并尝试进行聚合查询以列出它们的“分组依据”值。(试图实现这一点:)select dsn, v from collection group by dsn, v

db.getCollection("the-collection").aggregate(
    [
        { 
            "$group" : {
                "_id" : {
                    "dataSourceName" : "$dataSourceName", 
                    "version" : "$version"
                }
            }
        }
    ], 
    { 
        "allowDiskUse" : false
    }
);

尽管 MongoDB 在服务器上占用了大约 10GB 的 RAM,但字段被索引并且根本没有其他任何东西在运行,聚合需要大约 40 秒。

我尝试创建一个新索引,它按顺序包含两个字段,但是查询似乎没有使用索引:

{ 
    "stages" : [
        {
            "$cursor" : {
                "query" : {

                }, 
                "fields" : {
                    "dataSourceName" : NumberInt(1), 
                    "version" : NumberInt(1), 
                    "_id" : NumberInt(0)
                }, 
                "queryPlanner" : {
                    "plannerVersion" : NumberInt(1), 
                    "namespace" : "db.the-collection", 
                    "indexFilterSet" : false, 
                    "parsedQuery" : {

                    }, 
                    "winningPlan" : {
                        "stage" : "COLLSCAN", 
                        "direction" : "forward"
                    }, 
                    "rejectedPlans" : [

                    ]
                }
            }
        }, 
        {
            "$group" : {
                "_id" : {
                    "dataSourceName" : "$dataSourceName", 
                    "version" : "$version"
                }
            }
        }
    ], 
    "ok" : 1.0
}

我在 Windows 上使用 MongoDB 3.6.5 64bit,所以它应该使用索引:https ://docs.mongodb.com/master/core/aggregation-pipeline/#pipeline-operators-and-indexes

正如@Alex-Blex 建议的那样,我尝试了排序,但我得到了 OOM 错误:

The following error occurred while attempting to execute the aggregate query

Mongo Server error (MongoCommandException): Command failed with error 16819: 'Sort exceeded memory limit of 104857600 bytes, but did not opt in to external sorting. Aborting operation. Pass allowDiskUse:true to opt in.' on server server-address:port. 

The full response is:
{ 

    "ok" : 0.0, 

    "errmsg" : "Sort exceeded memory limit of 104857600 bytes, but did not opt in to external sorting. Aborting operation. Pass allowDiskUse:true to opt in.", 

    "code" : NumberInt(16819), 

    "codeName" : "Location16819"

}

我的错,我在错误的集合上尝试过......添加与索引相同的排序,现在它正在使用索引。仍然没有快速思考,花了〜10秒给我结果。

新的解释:

{ 
    "stages" : [
        {
            "$cursor" : {
                "query" : {

                }, 
                "sort" : {
                    "dataSourceName" : NumberInt(1), 
                    "version" : NumberInt(1)
                }, 
                "fields" : {
                    "dataSourceName" : NumberInt(1), 
                    "version" : NumberInt(1), 
                    "_id" : NumberInt(0)
                }, 
                "queryPlanner" : {
                    "plannerVersion" : NumberInt(1), 
                    "namespace" : "....", 
                    "indexFilterSet" : false, 
                    "parsedQuery" : {

                    }, 
                    "winningPlan" : {
                        "stage" : "PROJECTION", 
                        "transformBy" : {
                            "dataSourceName" : NumberInt(1), 
                            "version" : NumberInt(1), 
                            "_id" : NumberInt(0)
                        }, 
                        "inputStage" : {
                            "stage" : "IXSCAN", 
                            "keyPattern" : {
                                "dataSourceName" : NumberInt(1), 
                                "version" : NumberInt(1)
                            }, 
                            "indexName" : "dataSourceName_1_version_1", 
                            "isMultiKey" : false, 
                            "multiKeyPaths" : {
                                "dataSourceName" : [

                                ], 
                                "version" : [

                                ]
                            }, 
                            "isUnique" : false, 
                            "isSparse" : false, 
                            "isPartial" : false, 
                            "indexVersion" : NumberInt(2), 
                            "direction" : "forward", 
                            "indexBounds" : {
                                "dataSourceName" : [
                                    "[MinKey, MaxKey]"
                                ], 
                                "version" : [
                                    "[MinKey, MaxKey]"
                                ]
                            }
                        }
                    }, 
                    "rejectedPlans" : [

                    ]
                }
            }
        }, 
        {
            "$group" : {
                "_id" : {
                    "dataSourceName" : "$dataSourceName", 
                    "version" : "$version"
                }
            }
        }
    ], 
    "ok" : 1.0
}

标签: mongodbaggregate-functions

解决方案


您所指的页面完全相反:

$match 和 $sort 管道运算符可以利用索引

你的第一阶段是$group,既不是$match也不是$sort

尝试在第一阶段对其进行排序以触发索引的使用:

db.getCollection("the-collection").aggregate(
    [
        { $sort: { dataSourceName:1, version:1 } },
        { 
            "$group" : {
                "_id" : {
                    "dataSourceName" : "$dataSourceName", 
                    "version" : "$version"
                }
            }
        }
    ], 
    { 
        "allowDiskUse" : false
    }
);

请注意,它应该是具有相同字段和排序的单个复合索引:

db.getCollection("the-collection").createIndex({ dataSourceName:1, version:1 })

推荐阅读