python - joblib Parallel returning duplicate arrays
问题描述
import multiprocessing
from joblib import Parallel, delayed
import numpy as np
l1 = [0,1,2,3,4]
l2 = [1,2]
c = np.empty([5,2])
def myfun(item):
for i,ele in enumerate(l2):
c[item,i] = item + ele
return c
results = Parallel(n_jobs=-1, backend="threading")(map(delayed(myfun), l1))
I expected the result to be
[[1., 2.],
[2., 3.],
[3., 4.],
[4., 5.],
[5., 6.]]
why am I getting four identical arrays instead of just one?
解决方案
The issue is that you are updating the same array in each thread. When you print the results, it just prints the same array 5 times.
To get separate results for each thread, you need to create a copy of the main array.
Try this code:
import multiprocessing
from joblib import Parallel, delayed
import numpy as np
l1 = [0,1,2,3,4]
l2 = [1,2]
c = np.zeros([5,2])
lst = [np.copy(c) for x in l1] # array for each item
def myfun(item):
cc = lst[item] # array for this item
for i,ele in enumerate(l2):
cc[item,i] = item + ele # array for this item
c[item,i] = item + ele # main array
return cc
results = Parallel(n_jobs=-1, backend="threading")(map(delayed(myfun), l1))
print(results, '\n') # item arrays
print(c) # main array
Output
[array([[1., 2.],
[0., 0.],
[0., 0.],
[0., 0.],
[0., 0.]]),
array([[0., 0.],
[2., 3.],
[0., 0.],
[0., 0.],
[0., 0.]]),
array([[0., 0.],
[0., 0.],
[3., 4.],
[0., 0.],
[0., 0.]]),
array([[0., 0.],
[0., 0.],
[0., 0.],
[4., 5.],
[0., 0.]]),
array([[0., 0.],
[0., 0.],
[0., 0.],
[0., 0.],
[5., 6.]])]
[[1. 2.]
[2. 3.]
[3. 4.]
[4. 5.]
[5. 6.]]