首页 > 解决方案 > 不可腌制的并行作业库

问题描述

我有一个包含许多 .dat 文件的 zip 文件。在它们中的每一个中,我的目标是应用一些输出两个结果的函数,并且我想保存该函数的结果和三个列表所需的时间。顺序很重要。这是在没有并行计算的情况下执行此操作的代码:

result_1 = []
result_2 = []
runtimes = []
args_function = 'some args' # Always the same

with zipfile.ZipFile(zip_file, "r") as zip_ref:
    for name in sorted(zip_ref.namelist()):
        data = np.loadtxt(zip_ref.open(name))
        start_time = time.time()
        a, b = function(data, args_function)
        runtimes.append(time.time() - start_time)

result_1.append(a)
result_2.append(b)

在我看来,这似乎是令人尴尬的平行,所以我做了:

result_1 = []
result_2 = []
runtimes = []
args_function = 'some args' # Always the same

def compute_paralel(name, zip_ref):
    data = np.loadtxt(zip_ref.open(name))
    start_time = time.time()
    a, b = function(data, args_function)
    runtimes.append(time.time() - start_time)

    result_1.append(a)
    result_2.append(b)

with zipfile.ZipFile(zip_file, "r") as zip_ref:
     Parallel(n_jobs=-1)(delayed(compute_paralel)(name, zip_ref) for name in sorted(zip_ref.namelist()))

但是给我带来了以下错误:pickle.PicklingError: Could not pickle the task to send it to the workers.。因此,我不确定该怎么做...有什么想法吗?

标签: pythonparallel-processingmultiprocessing

解决方案


推荐阅读