首页 > 解决方案 > MongoDB - 如何优化此查找/更新

问题描述

我是 MongoDB 和 Python 的新手,必须使用 pymongo 编写脚本。有一个用户可以执行搜索的网站,在后端有一个 MongoDB,其中一个集合存储所有用户的搜索历史记录,一个集合存储所有用户。

我需要遍历所有用户,获取他们过去 30 天的所有搜索历史并计算总和,然后将该总和设置在他们的用户字段之一中。下面是我写的。有没有办法加快速度,即通过使用聚合,或通过多线程,或使其异步?

import pymongo
from datetime import datetime, timedelta
from bson.objectid import ObjectId


def lambda_handler(event, context):
    mongohost = '10.0.0.1'
    mongoport = 27017

    mongoclient = pymongo.MongoClient(mongohost, mongoport)
    mongodb = mongoclient["maindb"]
    mongo_search_logs_collection = mongodb["searchlogs"]
    mongo_users_collection = mongodb["users"]

    days_to_subtract_from_today = 30
    search_count_start_date = (datetime.today() - timedelta(days_to_subtract_from_today)).date()

    count = 0

    # Iterate over all users and update searchCount value
    for x in mongo_users_collection.find():

        # Get total searches last X days
        total_search_count = mongo_search_logs_collection.count_documents({
            'createdBy': ObjectId(x['_id']),
            'created': {'$gte': datetime(search_count_start_date.year, search_count_start_date.month, search_count_start_date.day)}
        })

        # Update searchCount value
        mongo_users_collection.update_one({
            '_id': ObjectId(x['_id'])
        }, {
            '$set': {
                'searchCount': total_search_count
            }
        }, upsert=False)

        # Increment counter
        count += 1

    print("Processed " + str(count) + " records")

标签: pythonmongodbasynchronouspymongo

解决方案


aggregation这可能是使用和bulk操作完成工作的一种方式:

import pymongo
from datetime import datetime, timedelta
from bson.objectid import ObjectId


def lambda_handler(event, context):
    mongohost = '10.0.0.1'
    mongoport = 27017

    mongoclient = pymongo.MongoClient(mongohost, mongoport)
    mongodb = mongoclient["maindb"]
    mongo_search_logs_collection = mongodb["searchlogs"]
    mongo_users_collection = mongodb["users"]

    days_to_subtract_from_today = 30
    search_count_start_date = (datetime.today() - timedelta(days_to_subtract_from_today)).date()

    cursor = mongo_search_logs_collection.aggregate([
        {
            "$match":{
                "created": {"$gte": datetime(search_count_start_date.year, search_count_start_date.month, search_count_start_date.day)}
            }
        },
        {
            "$group":{
                "_id": "$createdBy", "searchCount": { "$sum": 1 }
            }
        }
    ])

    bulk = mongo_users_collection.initialize_unordered_bulk_op()
    for res in cursor:
        bulk.find({ "_id": res["_id"] }).update({ "$set": { "searchCount": res["searchCount"] } }, upsert=False)

    bulk.execute()

如果您有任何问题或疑问,请告诉我,因为我没有测试它;)


推荐阅读