首页 > 解决方案 > joblib Parallel returning duplicate arrays

问题描述

import multiprocessing
from joblib import Parallel, delayed
import numpy as np

l1 = [0,1,2,3,4]
l2 = [1,2]

c = np.empty([5,2])

def myfun(item):
    for i,ele in enumerate(l2):
        c[item,i] = item + ele
    return c

results = Parallel(n_jobs=-1, backend="threading")(map(delayed(myfun), l1))

I expected the result to be

[[1., 2.],
 [2., 3.],
 [3., 4.],
 [4., 5.],
 [5., 6.]]

why am I getting four identical arrays instead of just one?

标签: pythonpython-multiprocessingjoblib

解决方案


The issue is that you are updating the same array in each thread. When you print the results, it just prints the same array 5 times.

To get separate results for each thread, you need to create a copy of the main array.

Try this code:

import multiprocessing
from joblib import Parallel, delayed
import numpy as np

l1 = [0,1,2,3,4]
l2 = [1,2]

c = np.zeros([5,2])
lst = [np.copy(c) for x in l1]  #  array for each item

def myfun(item):
    cc = lst[item]  # array for this item
    for i,ele in enumerate(l2):
        cc[item,i] = item + ele # array for this item
        c[item,i] = item + ele # main array
    return cc

results = Parallel(n_jobs=-1, backend="threading")(map(delayed(myfun), l1))

print(results, '\n')  # item arrays
print(c)  # main array

Output

[array([[1., 2.],
        [0., 0.],
        [0., 0.],
        [0., 0.],
        [0., 0.]]), 
 array([[0., 0.],
        [2., 3.],
        [0., 0.],
        [0., 0.],
        [0., 0.]]), 
 array([[0., 0.],
        [0., 0.],
        [3., 4.],
        [0., 0.],
        [0., 0.]]), 
 array([[0., 0.],
        [0., 0.],
        [0., 0.],
        [4., 5.],
        [0., 0.]]), 
 array([[0., 0.],
        [0., 0.],
        [0., 0.],
        [0., 0.],
        [5., 6.]])]

[[1. 2.]
 [2. 3.]
 [3. 4.]
 [4. 5.]
 [5. 6.]]

推荐阅读