首页 > 解决方案 > Multiproceccing + PyMongo 导致 [Errno 111]

问题描述

再会!

我刚刚开始玩pymongo多处理。我收到了一个用于实验的多核单元,它运行Ubuntu 18.04.4 LTS, codename: bionic. 只是为了实验,我用python 3.8和都试过了python 3.10,不幸的是结果是相似的:

>7lvv_E mol:na length:29  DNA (28-MER)
ELSE 7lvv_E
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
  File "LoadDataOnSequence.py", line 54, in createCollectionPDB
    x = newCol.insert_one(dict2Write)
  File "/home/username/.local/lib/python3.8/site-packages/pymongo/collection.py", line 698, in insert_one
    self._insert(document,
  File "/home/username/.local/lib/python3.8/site-packages/pymongo/collection.py", line 613, in _insert
    return self._insert_one(
  File "/home/username/.local/lib/python3.8/site-packages/pymongo/collection.py", line 602, in _insert_one
    self.__database.client._retryable_write(
  File "/home/username/.local/lib/python3.8/site-packages/pymongo/mongo_client.py", line 1497, in _retryable_write
    with self._tmp_session(session) as s:
  File "/usr/lib/python3.8/contextlib.py", line 113, in __enter__
    return next(self.gen)
  File "/home/username/.local/lib/python3.8/site-packages/pymongo/mongo_client.py", line 1829, in _tmp_session
    s = self._ensure_session(session)
  File "/home/username/.local/lib/python3.8/site-packages/pymongo/mongo_client.py", line 1816, in _ensure_session
    return self.__start_session(True, causal_consistency=False)
  File "/home/username/.local/lib/python3.8/site-packages/pymongo/mongo_client.py", line 1766, in __start_session
    server_session = self._get_server_session()
  File "/home/username/.local/lib/python3.8/site-packages/pymongo/mongo_client.py", line 1802, in _get_server_session
    return self._topology.get_server_session()
  File "/home/username/.local/lib/python3.8/site-packages/pymongo/topology.py", line 496, in get_server_session
    self._select_servers_loop(
  File "/home/username/.local/lib/python3.8/site-packages/pymongo/topology.py", line 215, in _select_servers_loop
    raise ServerSelectionTimeoutError(
pymongo.errors.ServerSelectionTimeoutError: 127.0.0.1:27017: [Errno 111] Connection refused, Timeout: 30s, Topology Description: <TopologyDescription id: 60db2071e53de99692268c6f, topology_type: Single, servers: [<ServerDescription ('127.0.0.1', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('127.0.0.1:27017: [Errno 111] Connection refused')>]>
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "LoadDataOnSequence.py", line 82, in <module>
    myPool.map(createCollectionPDB, listFile("datum/pdb_seqres.txt"))
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 771, in get
    raise self._value
pymongo.errors.ServerSelectionTimeoutError: 127.0.0.1:27017: [Errno 111] Connection refused, Timeout: 30s, Topology Description: <TopologyDescription id: 60db2071e53de99692268c6f, topology_type: Single, servers: [<ServerDescription ('127.0.0.1', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('127.0.0.1:27017: [Errno 111] Connection refused')>]>

我已经通过不同的方式修改我的代码多次尝试,但没有运气。此外,我已经尝试通过 SSH 从 PyCharm 运行代码,并通过创建包含所有必要文件的本地(在多核机器上)文件夹。

我计算核心数量并创建我的MongoClient

from multiprocessing import *
from pymongo import MongoClient



#Number of cores
x = cpu_count()
print(x)


myClient = MongoClient('mongodb://127.0.0.1:27017/')

我使用该函数准备了一个要传递的列表:

def listFile(fileName):
    fOpen = open(fileName)
    listFile = fOpen.readlines()
    arrOfArrs = []
    tmp1 = []
    for i in listFile:
        # print(i)
        if i.startswith(">"):
            if len(tmp1) > 1:
                arrOfArrs.append(tmp1)
            tmp1 = []
            tmp1.append(i.strip())
        else:
            tmp1.append(i.strip())
    #print(listFile)
    return arrOfArrs

这就是我可以准备一个大文本文件的方式(实际上会有一个更大的文件,我只是使用其中一个 PDB 文件进行测试:https ://www.wwpdb.org/ftp/pdb-ftp-sites我使用 seqres 文件,我没有链接确切的文件,因为它会立即下载)。我想直到那一刻一切正常。接下来是函数,它将用于Pool

def createCollectionPDB(fP):
        lineName = ""
        lineFASTA = ""
        colName = ""
        PDBName = ""
        chainIDName = ""
        typeOfMol = ""
        molLen = ""
        proteinName = ""
        for i in fP:
            print("test", i)
            print(lineName)
            if ">" in i:
                lineName = i.strip()
                print("LINE NAME")
                colName = lineName.split(" ")[0].strip()[1:]
                print("COLNAME", colName)
                PDBName = lineName.split("_")[0].strip()
                chainIDName = colName.split("_")[-1].strip()
                typeOfMol = lineName.split(" ")[1].strip().split(":")[1].strip()
                molLen = lineName.split(" ")[2].strip().split(":")[-1].strip()#[3].split(" ")[0].strip()
                proteinName = lineName.split(" ")[-1].strip()
                print(colName, PDBName, chainIDName, typeOfMol, molLen, proteinName)
            else:
                print("ELSE", colName)
                lineFASTA = i.strip()
                dict2Write={"PDB_ID" : PDBName, "Chain_ID" : chainIDName, "Molecule Type" : typeOfMol, "Length" : molLen, "Protein_Name" : proteinName, "FASTA" : lineFASTA}
                myNewDB = myClient["MyPrjPrj_PDBs"]
                newCol = myNewDB[colName]
                x = newCol.insert_one(dict2Write)
                print("PDB", x.inserted_id)#'''

那个曾经也可以工作。最后我multiprocess

f1 = listFile("datum/pdb_seqres.txt")
myPool = Pool(processes=x)
myPool.map(createCollectionPDB, f1)
myPool.join()
myPool.close()

我一直在寻找各种解决方案,例如更改 Python 版本,尝试不同(5.0 和 4.x)版本的 mongo,以及重新启动 mongo。我也尝试过更改进程的数量,这给我留下了几乎相同的错误,尽管停在不同的行。我尝试过的另一个选择是使用 ssh_pymongo,但也没有运气。它也可以在没有多处理的情况下工作,尽管没有多处理我在较小的文件上使用它。

标签: python-3.xmongodbmultiprocessingpymongo

解决方案


每个进程都需要有自己的客户端,因此您很可能需要在每个进程中创建客户端,而不是在调用多处理之前创建一个。

分叉进程:套接字传递期间失败:损坏的管道包含有关 MongoDB 驱动程序如何处理分叉的一般信息。


推荐阅读