python-3.x - Python 多处理似乎正在影响程序的结果
问题描述
我有一个程序应该访问一组给定的 URL 并下载图片。原来的程序很慢,所以我实现了多处理来加速它。但现在,新程序下载的图片与原程序不同。似乎它正在跳过一些 URL。这可能与多处理有关吗?如果两个进程同时尝试将照片保存到我的计算机会怎样?它会引起问题并可能忽略一个问题吗?
没有多处理的原始程序:
def accessAndSaveFiles(urlSet, user, verboseFlag):
for url in urlSet:
...
img_data = requests.get(url, allow_redirects=True)
open(filePath, 'wb').write(img_data.content)
def main():
...
accessAndSaveFiles(urlSet, user, verboseFlag)
...
具有多处理功能的新程序:
def accessAndSaveFiles(urlSet, user, verboseFlag):
with multiprocessing.Pool(os.cpu_count()) as pool:
pool.starmap(processURL, zip(urlSet, repeat(user), repeat(verboseFlag)))
def processURL(url, user, verboseFlag):
...
img_data = get(url, allow_redirects=True)
open(filePath, 'wb').write(img_data.content)
def main():
...
accessAndSaveFiles(urlSet, user, verboseFlag)
...
感谢您的任何帮助!
解决方案
没有足够的信息来调试,但您可以通过添加一些打印语句来调试自己,以查看每个工作人员正在运行什么。例子:
import multiprocessing as mp
from itertools import repeat
import time
def accessAndSaveFiles(urlSet, user, verboseFlag):
with mp.Pool() as pool:
pool.starmap(processURL, zip(urlSet, repeat(user), repeat(verboseFlag)))
def processURL(url, user, verboseFlag):
print(mp.current_process().name,url,user,verboseFlag)
time.sleep(1) # Simulated work
print(mp.current_process().name,'done')
def main():
accessAndSaveFiles('abcdefghijklmnop', 'me', True)
if __name__ == '__main__':
main()
输出:
SpawnPoolWorker-2 a me True
SpawnPoolWorker-4 b me True
SpawnPoolWorker-7 c me True
SpawnPoolWorker-1 d me True
SpawnPoolWorker-6 e me True
SpawnPoolWorker-3 f me True
SpawnPoolWorker-5 g me True
SpawnPoolWorker-8 h me True
SpawnPoolWorker-2 done
SpawnPoolWorker-2 i me True
SpawnPoolWorker-4 done
SpawnPoolWorker-4 j me True
SpawnPoolWorker-6 done
SpawnPoolWorker-7 done
SpawnPoolWorker-3 done
SpawnPoolWorker-1 done
SpawnPoolWorker-6 k me True
SpawnPoolWorker-7 l me True
SpawnPoolWorker-3 m me True
SpawnPoolWorker-1 n me True
SpawnPoolWorker-5 done
SpawnPoolWorker-5 o me True
SpawnPoolWorker-8 done
SpawnPoolWorker-8 p me True
SpawnPoolWorker-2 done
SpawnPoolWorker-4 done
SpawnPoolWorker-6 done
SpawnPoolWorker-1 done
SpawnPoolWorker-3 done
SpawnPoolWorker-7 done
SpawnPoolWorker-5 done
SpawnPoolWorker-8 done
从中可以看出池中有8个worker,看到每个job传递的三个参数。由于有 16 个工作,当前 8 个工作完成时,另一个工作由工人接手,直到它们全部完成。
推荐阅读
- docker - 关于为新容器应用 docker 命名卷的奇怪问题
- docker - 在 CircleCI 工作流或作业之后触发 Github Action
- python - 当我尝试添加反应 python 时出现 on_message 错误
- php - Docker 容器在共享目录中看不到移动的文件
- javascript - 如何使用 Javascript 添加 Id 属性?
- python - 在python中循环替换单词
- mysql - 创建 docker 容器,执行脚本并删除容器
- c - C:strtok 不停止在分隔符处并读取多余的项目?
- reactjs - 如何在 React 中创建带有描述的下拉列表
- reactjs - 在 React 中单击后鼠标单击并输入键盘禁用