mongodb - Mongo 聚合失败并显示“超过 $group 的内存限制”
问题描述
我们有一个查询,它获取最小和最大纬度/经度。我们为此使用聚合查询。我们有 200 万份文件。
运行聚合查询时出现以下错误。我们如何解决这个问题?如果我们使用 allowDiskUse:true 会降低性能吗?或者我们可以添加一些可以解决这个问题的索引吗?
2021-04-02T23:57:16.682+0000 I COMMAND [conn2829719] command loc-service.locations command: aggregate { aggregate: "locations", pipeline: [ { $match: { customerId: "8047380094" } }, { $unwind: "$outdoorLocationInfo.location.coordinates" }, { $group: { _id: "$_id", longitude: { $first: "$outdoorLocationInfo.location.coordinates" }, latitude: { $last: "$outdoorLocationInfo.location.coordinates" } } }, { $group: { _id: null, minLongitude: { $min: "$longitude" }, maxLongitude: { $max: "$longitude" }, minLatitude: { $min: "$latitude" }, maxLatitude: { $max: "$latitude" } } } ], cursor: {}, allowDiskUse: false, $db: "loc-service", $clusterTime: { clusterTime: Timestamp(1617407827, 2), signature: { hash: BinData(0, F980F28628AF21C214BD2D3F4B7C48F56ACB47BD), keyId: 6914764447386959875 } }, lsid: { id: UUID("a6e20fee-7714-4460-bdc8-2019425c7ff0") } } planSummary: IXSCAN { customerId: 1, deviceId: 1 } numYields:7900 ok:0 errMsg:"Exceeded memory limit for $group, but didn't allow external sort. Pass allowDiskUse:true to opt in." errName:Location16945 errCode:16945 reslen:313 locks:{ Global: { acquireCount: { r: 8061 } }, Database: { acquireCount: { r: 8060 } }, Collection: { acquireCount: { r: 8060 } } } storage:{} protocol:op_msg 5448ms
查询
db.locations.aggregate([
{
$match: {
customerId: "8047380094"
}
},
{
$unwind: "$outdoorLocationInfo.location.coordinates"
},
{
$group: {
_id: "$_id",
longitude: {
$first: "$outdoorLocationInfo.location.coordinates"
},
latitude: {
$last: "$outdoorLocationInfo.location.coordinates"
}
}
},
{
$group: {
_id: null,
minLongitude: {
$min: "$longitude"
},
maxLongitude: {
$max: "$longitude"
},
minLatitude: {
$min: "$latitude"
},
maxLatitude: {
$max: "$latitude"
}
}
}
])
我们对这个集合的索引:
db.locations.getIndexes()
[
{
"v" : 2,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "loc-service.locations"
},
{
"v" : 2,
"key" : {
"customerId" : 1,
"deviceId" : 1
},
"name" : "customerId_1_deviceId_1",
"ns" : "loc-service.locations",
"sparse" : true,
"background" : true
},
{
"v" : 2,
"key" : {
"customerId" : 1,
"geoHash" : 1
},
"name" : "customerId_1_geoHash_1",
"ns" : "loc-service.locations",
"sparse" : true,
"background" : true
},
{
"v" : 2,
"key" : {
"customerId" : 1,
"outdoorLocationInfo.location" : "2dsphere"
},
"name" : "customerId_1_outdoorLocationInfo.location_2dsphere",
"ns" : "loc-service.locations",
"sparse" : true,
"background" : true,
"2dsphereIndexVersion" : 3
},
{
"v" : 2,
"key" : {
"customerId" : 1,
"outdoorLocationInfo.location.coordinates" : 1
},
"name" : "customerId_1_outdoorLocationInfo.location.coordinates_1",
"ns" : "loc-service.locations",
"sparse" : true,
"background" : true
}
]
样本数据:
db.locations.findOne()
{
"_id" : ObjectId("60551b70a48edf83848607d2"),
"outdoorLocationInfo" : {
"location" : {
"type" : "Point",
"coordinates" : [
-95.330024,
36.262476
]
}
},
"customerId" : "2868306879",
"deviceId" : "6eN7sMEOP1e",
"geoHash" : "9yknq9qu1rqp",
}
谢谢
解决方案
我认为您可以使用$arrayElemAt简化查询
db.collection.aggregate([
{
$match: {
customerId: "8047380094"
}
},
{
$group: {
_id: null,
"maxLatitude": {
"$max": {
"$arrayElemAt": [
"$outdoorLocationInfo.location.coordinates",
1
]
}
},
"maxLongitude": {
"$max": {
"$arrayElemAt": [
"$outdoorLocationInfo.location.coordinates",
0
]
}
},
"minLatitude": {
"$min": {
"$arrayElemAt": [
"$outdoorLocationInfo.location.coordinates",
1
]
}
},
"minLongitude": {
"$min": {
"$arrayElemAt": [
"$outdoorLocationInfo.location.coordinates",
0
]
}
},
}
}
])
在这里试试
推荐阅读
- django - django-filter 没有过滤
- r - 哪一列是重复列的副本?
- node.js - 添加回调异步方法的返回值以写入 HTTP 响应。异步回调页面在我写响应之前完成加载
- azure - 维护期间 Azure Functions 会发生什么?
- regex - 使用 bash 脚本查找 $...$ 分隔符之间的所有文本
- vba - 用户窗体运行时错误“13”上的组合框
- sql - 如何在单列中转换多个原始值
- javascript - 我对敌人有一个错误,当他们在我制作的一个小游戏中到达你的角色时
- deep-learning - 使用 saxpy 在时间序列中早期放弃不和谐搜索
- android - Android Gradle 同步失败找不到 support-core-ui.aar