首页 > 解决方案 > MongoDB对大型数据集的查询非常慢

问题描述

我正在做一个深度学习项目,其中我有大量的数据集(近 1000 万)在客户集合中。我正在根据要求过滤所有客户列。几乎每个过滤的列都是字符串。我不能在每一列(35 列)上都放置索引,因为这不是一个好主意。有一些复杂的查询以及像组聚合一样。

{
    "_id" : ObjectId("5ca35824a7ad6a17e9c6eeb7"),
    "batchId" : 1,
    "demographicsState" : "Minnesota",
    "demographicsGender" : "Female",
    "jobCount" : "0 to 6",
    "jobCreated" : "No",
    "callResolution" : "No",
    "customerEffortScore" : 2,
    "phoneAccessibility" : "90 to 100",
    "callRepTime" : "Just right",
    "hadPriorCallsPastThirtyFiveDays" : "Yes",
    "autoDebitFlag" : "No",
    "servcoName" : "Monitronics",
    "demographicsAge" : "45 to 54",
    "checkedWebsiteFirst" : "No",
    "alarmRelated" : "12-Sensor",
    "reasonPrimary" : "19-Alarm, system or equipment related reason",
    "inInitialTerm" : "Yes",
    "callDuration" : "10 to 19",
    "siteKind" : "Residential",
    "customerSiteTenureDays" : "326",
    "highRisk" : "No",
    "monthsLeftUntilContractRenewal" : "26",
    "nielsen" : "Savvy suburbs",
    "callReason" : "Customer tech support",
    "serviceScheduled" : "-",
    "hadPriorCallsPastFiveDays" : "Yes",
    "dropped" : "No",
    "serviceResolution" : "80 to 89",
    "dept" : 190,
    "serviceRepresentative" : "90 to 100",
    "demographicsIncome" : "50,000 - 74,999",
    "aarpMember" : "No",
    "rmr" : 44.99,
    "satisfactionOverall" : 9,
    "dropYes" : 1,
    "dropNo" : 0,
    "cltv" : 4146.578333333334
}

这是我获取数据所需的查询:

db.customers.aggregate(
            [{$match:[
                    {$and:[
                        {"demographicsState": "Minnesota"},
                        {"demographicsGender": "Female"},
                        {"jobCount": "0 to 6"},
                        {"jobCreated":"Yes"},
                        {"callResolution": "No"},
                        {"customerEffortScore": {"$gt":0 "$lt": 8}},
                        {"phoneAccessibility": "50 to 60"},
                        {"hadPriorCallsPastThirtyFiveDays": "No"},
                        {"autoDebitFlag": "Yes"},
                        {"alarmRelated": "10-Sensor"},
                        {"callDuration": "20 to 29"},
                        {"hadPriorCallsPastFiveDays": "Yes"},
                        {"demographicsIncome":"50,000-74,999"},
                        {"aarpMember": "Yes"},
                        {"rmr": {"$gt": 30 $lt: 50 }},
                        {"dropYes":1}
                    ]
            },
            {"$group":{"_id": "$demographicsGender", "count":{"$sum":1} }}]}])

我正在对客户表的上述模式中的每一列进行过滤和分组。请让我知道,如果有人有任何想法。

标签: mongodbmongoose

解决方案


推荐阅读