python-3.x - Multiproceccing + PyMongo 导致 [Errno 111]
问题描述
再会!
我刚刚开始玩pymongo
多处理。我收到了一个用于实验的多核单元,它运行Ubuntu 18.04.4 LTS, codename: bionic
. 只是为了实验,我用python 3.8
和都试过了python 3.10
,不幸的是结果是相似的:
>7lvv_E mol:na length:29 DNA (28-MER)
ELSE 7lvv_E
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/usr/lib/python3.8/multiprocessing/pool.py", line 48, in mapstar
return list(map(*args))
File "LoadDataOnSequence.py", line 54, in createCollectionPDB
x = newCol.insert_one(dict2Write)
File "/home/username/.local/lib/python3.8/site-packages/pymongo/collection.py", line 698, in insert_one
self._insert(document,
File "/home/username/.local/lib/python3.8/site-packages/pymongo/collection.py", line 613, in _insert
return self._insert_one(
File "/home/username/.local/lib/python3.8/site-packages/pymongo/collection.py", line 602, in _insert_one
self.__database.client._retryable_write(
File "/home/username/.local/lib/python3.8/site-packages/pymongo/mongo_client.py", line 1497, in _retryable_write
with self._tmp_session(session) as s:
File "/usr/lib/python3.8/contextlib.py", line 113, in __enter__
return next(self.gen)
File "/home/username/.local/lib/python3.8/site-packages/pymongo/mongo_client.py", line 1829, in _tmp_session
s = self._ensure_session(session)
File "/home/username/.local/lib/python3.8/site-packages/pymongo/mongo_client.py", line 1816, in _ensure_session
return self.__start_session(True, causal_consistency=False)
File "/home/username/.local/lib/python3.8/site-packages/pymongo/mongo_client.py", line 1766, in __start_session
server_session = self._get_server_session()
File "/home/username/.local/lib/python3.8/site-packages/pymongo/mongo_client.py", line 1802, in _get_server_session
return self._topology.get_server_session()
File "/home/username/.local/lib/python3.8/site-packages/pymongo/topology.py", line 496, in get_server_session
self._select_servers_loop(
File "/home/username/.local/lib/python3.8/site-packages/pymongo/topology.py", line 215, in _select_servers_loop
raise ServerSelectionTimeoutError(
pymongo.errors.ServerSelectionTimeoutError: 127.0.0.1:27017: [Errno 111] Connection refused, Timeout: 30s, Topology Description: <TopologyDescription id: 60db2071e53de99692268c6f, topology_type: Single, servers: [<ServerDescription ('127.0.0.1', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('127.0.0.1:27017: [Errno 111] Connection refused')>]>
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "LoadDataOnSequence.py", line 82, in <module>
myPool.map(createCollectionPDB, listFile("datum/pdb_seqres.txt"))
File "/usr/lib/python3.8/multiprocessing/pool.py", line 364, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/usr/lib/python3.8/multiprocessing/pool.py", line 771, in get
raise self._value
pymongo.errors.ServerSelectionTimeoutError: 127.0.0.1:27017: [Errno 111] Connection refused, Timeout: 30s, Topology Description: <TopologyDescription id: 60db2071e53de99692268c6f, topology_type: Single, servers: [<ServerDescription ('127.0.0.1', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('127.0.0.1:27017: [Errno 111] Connection refused')>]>
我已经通过不同的方式修改我的代码多次尝试,但没有运气。此外,我已经尝试通过 SSH 从 PyCharm 运行代码,并通过创建包含所有必要文件的本地(在多核机器上)文件夹。
我计算核心数量并创建我的MongoClient
:
from multiprocessing import *
from pymongo import MongoClient
#Number of cores
x = cpu_count()
print(x)
myClient = MongoClient('mongodb://127.0.0.1:27017/')
我使用该函数准备了一个要传递的列表:
def listFile(fileName):
fOpen = open(fileName)
listFile = fOpen.readlines()
arrOfArrs = []
tmp1 = []
for i in listFile:
# print(i)
if i.startswith(">"):
if len(tmp1) > 1:
arrOfArrs.append(tmp1)
tmp1 = []
tmp1.append(i.strip())
else:
tmp1.append(i.strip())
#print(listFile)
return arrOfArrs
这就是我可以准备一个大文本文件的方式(实际上会有一个更大的文件,我只是使用其中一个 PDB 文件进行测试:https ://www.wwpdb.org/ftp/pdb-ftp-sites我使用 seqres 文件,我没有链接确切的文件,因为它会立即下载)。我想直到那一刻一切正常。接下来是函数,它将用于Pool
:
def createCollectionPDB(fP):
lineName = ""
lineFASTA = ""
colName = ""
PDBName = ""
chainIDName = ""
typeOfMol = ""
molLen = ""
proteinName = ""
for i in fP:
print("test", i)
print(lineName)
if ">" in i:
lineName = i.strip()
print("LINE NAME")
colName = lineName.split(" ")[0].strip()[1:]
print("COLNAME", colName)
PDBName = lineName.split("_")[0].strip()
chainIDName = colName.split("_")[-1].strip()
typeOfMol = lineName.split(" ")[1].strip().split(":")[1].strip()
molLen = lineName.split(" ")[2].strip().split(":")[-1].strip()#[3].split(" ")[0].strip()
proteinName = lineName.split(" ")[-1].strip()
print(colName, PDBName, chainIDName, typeOfMol, molLen, proteinName)
else:
print("ELSE", colName)
lineFASTA = i.strip()
dict2Write={"PDB_ID" : PDBName, "Chain_ID" : chainIDName, "Molecule Type" : typeOfMol, "Length" : molLen, "Protein_Name" : proteinName, "FASTA" : lineFASTA}
myNewDB = myClient["MyPrjPrj_PDBs"]
newCol = myNewDB[colName]
x = newCol.insert_one(dict2Write)
print("PDB", x.inserted_id)#'''
那个曾经也可以工作。最后我multiprocess
:
f1 = listFile("datum/pdb_seqres.txt")
myPool = Pool(processes=x)
myPool.map(createCollectionPDB, f1)
myPool.join()
myPool.close()
我一直在寻找各种解决方案,例如更改 Python 版本,尝试不同(5.0 和 4.x)版本的 mongo,以及重新启动 mongo。我也尝试过更改进程的数量,这给我留下了几乎相同的错误,尽管停在不同的行。我尝试过的另一个选择是使用 ssh_pymongo,但也没有运气。它也可以在没有多处理的情况下工作,尽管没有多处理我在较小的文件上使用它。
解决方案
每个进程都需要有自己的客户端,因此您很可能需要在每个进程中创建客户端,而不是在调用多处理之前创建一个。
分叉进程:套接字传递期间失败:损坏的管道包含有关 MongoDB 驱动程序如何处理分叉的一般信息。
推荐阅读
- postgresql - 在函数 postgresql 中获得所需的结果
- python - 错误“str”对象不能解释为整数
- excel - Excel 在新的 Excel 文件中将特定工作表导出为 CSV
- css - 要删除的 Css 动画,然后添加回文本下划线
- reactjs - 导入没有类型的JS文件时反应错误
- python - 如果列值中包含列表值,则在列上过滤数据框。熊猫
- docker - 使用自制软件安装特定版本的一致
- java - 使用 CrossProfileApps 在托管配置文件上运行应用程序
- typescript - Typeorm:从加载的关系中只返回一个属性
- c++ - 使用 C++ 标准库模拟 mkdir -p