首页 > 解决方案 > python joblib返回`TypeError:无法使用相同的代码腌制'weakref'对象,但输入数据不同

问题描述

我正在尝试使用该库中定义的函数并行化将字符串转换为第三方包对象的代码。但是,joblib失败取决于我提供的输入数据。使用 joblib 时函数的返回类型重要吗?

要重现该问题:

首先安装第三方库:

pip install joblib music21

并下载数据文件test_input.abc(它是 4kb 的文本)。

此代码将作为脚本正常运行:

from typing import List

import music21
from joblib import Parallel, delayed


def convert_string(string: str, format: str = "abc") -> music21.stream.Score:
    return music21.converter.parse(string, format=format)


def convert_list_of_strings(
    string_list,
    n_jobs=-1,
    prefer=None
) -> List[music21.stream.Score]:
    return Parallel(n_jobs=n_jobs, prefer=prefer)(
        delayed(convert_string)(string) for string in string_list
    )

if __name__ == "__main__":
    string_list = ['T:tune\nM:3/4\nL:1/8\nK:C\nab cd ef|GA BC DE' for _ in range(1000)]
    output = convert_list_of_strings(string_list)
    print(output)

即它返回一个music21.stream.Score对象列表。

但是,如果您更改主调用以读取附件,即:

if __name__ == "__main__":
    filepath = "test_input.abc"
    tune_sep = "\n\n"
    with open(filepath, "r") as file_object:
        string_list = file_object.read().strip().split(tune_sep)
    output = convert_list_of_strings(string_list)
    print(output)

这将返回以下错误:

joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/path/to/venv/lib/python3.8/site-packages/joblib/externals/loky/process_executor.py", line 356, in _sendback_result
    result_queue.put(_ResultItem(work_id, result=result,
  File "/path/to/venv/lib/python3.8/site-packages/joblib/externals/loky/backend/queues.py", line 241, in put
    obj = dumps(obj, reducers=self._reducers)
  File "/path/to/venv/lib/python3.8/site-packages/joblib/externals/loky/backend/reduction.py", line 271, in dumps
    dump(obj, buf, reducers=reducers, protocol=protocol)
  File "/path/to/venv/lib/python3.8/site-packages/joblib/externals/loky/backend/reduction.py", line 264, in dump
    _LokyPickler(file, reducers=reducers, protocol=protocol).dump(obj)
  File "/path/to/venv/lib/python3.8/site-packages/joblib/externals/cloudpickle/cloudpickle_fast.py", line 563, in dump
    return Pickler.dump(self, obj)
TypeError: cannot pickle 'weakref' object
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "joblib_test.py", line 50, in <module>
    output = convert_list_of_strings(string_list)
  File "joblib_test.py", line 39, in convert_list_of_strings
    return Parallel(n_jobs=n_jobs, prefer=prefer)(
  File "/path/to/venv/lib/python3.8/site-packages/joblib/parallel.py", line 1054, in __call__
    self.retrieve()
  File "/path/to/venv/lib/python3.8/site-packages/joblib/parallel.py", line 933, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "/path/to/venv/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "/path/to/venv/lib/python3.8/concurrent/futures/_base.py", line 444, in result
    return self.__get_result()
  File "/path/to/venv/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
    raise self._exception
TypeError: cannot pickle 'weakref' object

我试图解决的问题

def convert_string(string: str, format: str = "abc") -> str:
    return music21.converter.freezeStr(music21.converter.parse(string, format=format))

这也意味着代码将运行......但现在我需要反序列化数千个对象!

所以我猜输出中的内容music21.stream.Score导致问题吗?

标签: pythonjoblibmusic21

解决方案


更改music21.sites.WEAKREF_ACTIVE = False(您可能需要直接编辑 music21/sites.py)并且 music21 不会使用任何弱引用。无论如何,它们可能会在 v8 中消失(或者甚至更快,因为它们主要是实现细节)。在 Pre-Python2.6 循环引用计数时代运行时需要它们music21,但它们不再是必需的。

但是,您的代码不会得到很大的加速,因为序列化和反序列化 Stream 以跨越多处理工作核心->控制器-核心边界的过程通常需要解析文件本身的时间,如果不是更多。我找不到我在某个时候写它的地方,但是有一个并行运行音乐的指南21,它建议在工作核心中进行所有流解析,并且只传回小数据结构(音符数量等),不是整个分数。

哦,对于其中一些事情,music21 的 common.parallel 库(使用 joblib)将有助于使常见任务更容易:

https://web.mit.edu/music21/doc/moduleReference/moduleCommonParallel.html


推荐阅读