首页 > 解决方案 > Python 多处理似乎正在影响程序的结果

问题描述

我有一个程序应该访问一组给定的 URL 并下载图片。原来的程序很慢,所以我实现了多处理来加速它。但现在,新程序下载的图片与原程序不同。似乎它正在跳过一些 URL。这可能与多处理有关吗?如果两个进程同时尝试将照片保存到我的计算机会怎样?它会引起问题并可能忽略一个问题吗?

没有多处理的原始程序:

def accessAndSaveFiles(urlSet, user, verboseFlag):
    for url in urlSet:
        ...
        img_data = requests.get(url, allow_redirects=True)
        open(filePath, 'wb').write(img_data.content)

def main():
    ...
    accessAndSaveFiles(urlSet, user, verboseFlag)
    ...

具有多处理功能的新程序:

def accessAndSaveFiles(urlSet, user, verboseFlag):
    with multiprocessing.Pool(os.cpu_count()) as pool:
        pool.starmap(processURL, zip(urlSet, repeat(user), repeat(verboseFlag)))

def processURL(url, user, verboseFlag):
    ...
    img_data = get(url, allow_redirects=True)
    open(filePath, 'wb').write(img_data.content)

def main():
    ...
    accessAndSaveFiles(urlSet, user, verboseFlag)
    ...

感谢您的任何帮助!

标签: python-3.xmultiprocessing

解决方案


没有足够的信息来调试,但您可以通过添加一些打印语句来调试自己,以查看每个工作人员正在运行什么。例子:

import multiprocessing as mp
from itertools import repeat
import time

def accessAndSaveFiles(urlSet, user, verboseFlag):
    with mp.Pool() as pool:
        pool.starmap(processURL, zip(urlSet, repeat(user), repeat(verboseFlag)))

def processURL(url, user, verboseFlag):
    print(mp.current_process().name,url,user,verboseFlag)
    time.sleep(1) # Simulated work
    print(mp.current_process().name,'done')

def main():
    accessAndSaveFiles('abcdefghijklmnop', 'me', True)

if __name__ == '__main__':
    main()

输出:

SpawnPoolWorker-2 a me True
SpawnPoolWorker-4 b me True
SpawnPoolWorker-7 c me True
SpawnPoolWorker-1 d me True
SpawnPoolWorker-6 e me True
SpawnPoolWorker-3 f me True
SpawnPoolWorker-5 g me True
SpawnPoolWorker-8 h me True
SpawnPoolWorker-2 done
SpawnPoolWorker-2 i me True
SpawnPoolWorker-4 done
SpawnPoolWorker-4 j me True
SpawnPoolWorker-6 done
SpawnPoolWorker-7 done
SpawnPoolWorker-3 done
SpawnPoolWorker-1 done
SpawnPoolWorker-6 k me True
SpawnPoolWorker-7 l me True
SpawnPoolWorker-3 m me True
SpawnPoolWorker-1 n me True
SpawnPoolWorker-5 done
SpawnPoolWorker-5 o me True
SpawnPoolWorker-8 done
SpawnPoolWorker-8 p me True
SpawnPoolWorker-2 done
SpawnPoolWorker-4 done
SpawnPoolWorker-6 done
SpawnPoolWorker-1 done
SpawnPoolWorker-3 done
SpawnPoolWorker-7 done
SpawnPoolWorker-5 done
SpawnPoolWorker-8 done

从中可以看出池中有8个worker,看到每个job传递的三个参数。由于有 16 个工作,当前 8 个工作完成时,另一个工作由工人接手,直到它们全部完成。


推荐阅读