首页 > 解决方案 > MongoDB:如何在其他集合中也存在的集合中插入文档?

问题描述

我有两个系列EN_PR2019EN_PR2018. 它们大多数包含相同的东西,但来自不同的年份。将所有文档插入后EN_PR2019,我尝试插入可能与_id集合中相同的文档EN_PR2019。我读到我需要为集合创建一个索引,以便能够_id在两个不同的集合中拥有相同的记录。现在我得到pymongo.errors.DuplicateKeyError: E11000 duplicate key error collection: Database.EN_PR2018 index: id_1 dup key: { id: null }.

如何插入相同的记录,_id在两个不同的集合中具有相同的记录,而不会引发错误或不必处理重复项?

def check_record(collection, record_id):
    """Check if record exists in collection
        Args:
            record_id (str): record _id as in collection
    """
    return collection.find_one({'id': record_id})

def collection_index(collection, index):
    """Checks if index exists for collection, 
    and return a new index if not

        Args:
            collection (str): Name of collection in database
            index (str): Dict key to be used as an index
    """
    if index not in collection.index_information():
        return collection.create_index([(index, pymongo.ASCENDING)], unique=True)

def push_upstream(collection, record_id, record):
    """Update record in collection
        Args:
            collection (str): Name of collection in database
            record_id (str): record _id to be put for record in collection
            record (dict): Data to be pushed in collection
    """
    return collection.insert_one({"_id": record_id}, {"$set": record})

def update_upstream(collection, record_id, record):
    """Update record in collection
        Args:
            collection (str): Name of collection in database
            record_id (str): record _id as in collection
            record (dict): Data to be updated in collection
    """
    return collection.update_one({"_id": record_id}, {"$set": record}, upsert=True)

def executePushPlayer(db):

    playerstats = load_file(db.playerfile)
    collection = db.DATABASE[db.league + db.season]
    collection_index(collection, 'id')
    for player in playerstats:
        existingPost = check_record(collection, player['id'])
        if existingPost:
            update_upstream(collection, player['id'], player)
        else:
            push_upstream(collection, player['id'], player)

if __name__ == '__main__':
    test = DB('EN_PR', '2018')
    executePushPlayer(test)

标签: pythonmongodb

解决方案


_id插入 MongoDB 数据库的每个文档中的字段都是特殊的,因为该字段_id始终被索引,并且索引是唯一索引_id只要新集合中没有违反唯一性约束,在另一个集合中使用一个集合中的字段是完全合理的。

从错误中我猜你的几个player["_id"]值是空的。load_file这表明您的项目中存在一些问题。


推荐阅读