首页 > 解决方案 > 100k 条记录的 PyMongo Bulk Upsert 性能

问题描述

我每天使用 PyMongo 批量更新插入约 100k 文档。大多数文档将被更新,并且将创建约 100 个文档。我正在使用下面的代码来执行批量写入操作,但目前 200 个文档需要约 36 秒,对于 100k 文档,这将转换为约 5 小时,我想知道如何优化它?

from pymongo import MongoClient, UpdateOne

# Connect to Mongo
client = MongoClient(mongo_key)
db = client.db_name

def send_mongo(data, my_list):

    operations = []
    
    # Loop through new data to prepare it for mongo
    for p, li in zip(data, my_list):

        # core data
        custom_id = li.get_text()
        core_data = p['info']          
        rela_data = p['rels']

        operations.append(
            UpdateOne(
                {
                    "cust_id": custom_id
                },
                { 
                    "$set": { 
                        "cust_id": custom_id,
                        "core": core_data,
                        "releases": rels_data,
                    }
                },
                upsert=True
            )
        )

        # Send once every 1000 in batch
        if ( len(operations) == 1000 ):
            db.Collection.bulk_write(operations,ordered=False)
            operations = []
                    

标签: pythonmongodbpymongo

解决方案


推荐阅读