首页 > 解决方案 > Python3多处理内存不足问题

问题描述

我的计算机在使用多处理编程进行文件处理(超过 10k)时挂起。我该如何解决这个问题?

我的电脑有 8 个核心,我无法解决这个问题。请帮忙^^

def execute_command(runcommand):
    child = Popen(runcommand.split(' '),stdout=DEVNULL, stderr=DEVNULL)
    try:
        child.wait(timeout=TIMEOUT)
    except Exception as e:
        child.kill()
    return child.returncode



def worker():
    file = #generate file list 
    for i in file:
        cmd = ''.join(['/usr/bin/blab','params',i])
        res = execute_command(cmd)

def main():
    a = []
    for _ in range(4):
        p = Process(target=worker)
        a.append(p)
        p.start()
    
    for i in a:
        i.join()

main()

标签: python-3.xmultiprocessingpython-3.8

解决方案


In your use case, each child process may be leaking resources as it runs (hard to tell from example code), and it may be beneficial to periodically re-start the child processes in order to allow the OS to collect un-used memory. multiprocessing.Pool has a built-in mechanism to do this for you (no sense in re-inventing the wheel) by specifying the argument maxtasksperchild. This is how you would re-write your example to take advantage of that:

from multiprocessing import Pool

def execute_command(i):
    runcommand = ''.join(['/usr/bin/blab','params',i]) #move this to the child target function
    child = Popen(runcommand.split(' '),stdout=DEVNULL, stderr=DEVNULL)
    try:
        child.wait(timeout=TIMEOUT)
    except Exception as e:
        child.kill()
    return child.returncode

def main():
    file = #generate file list 
    with Pool(4, maxtasksperchild=10) as pool: #more tasks per child could mean more memory usage, but fewer tasks per child means more overhead to keep re-creating child processes
        pool.map(execute_command, file)

if __name__ == "__main__": #always use this to prevent multiprocessing code from running on import
    main()


推荐阅读