首页 > 解决方案 > 将 Spacy 对象序列化为 Json

问题描述

我正在尝试从 Spacy 序列化 Doc 对象。看起来所有层次结构都没有被序列化。基本上我想序列化这个对象以通过 Rest 调用发送。

简单的测试用例如下:

import spacy
import jsonpickle

nlp = spacy.load('en_core_web_sm')
print(type(nlp))

text = "This is United States"
doc = nlp(text)
print('Output from noun_chunks before Serialization:')
for chunk in doc.noun_chunks:
    print(chunk)

frozen = jsonpickle.encode(doc)

doc = jsonpickle.decode(frozen)
print(type(doc))

print('Output from noun_chunks after SerDe:')
for chunk in doc.noun_chunks:
    print(chunk)

错误:

> Traceback (most recent call last):   File "tests/temp.py", line 19, in
> <module>
>     for chunk in doc.noun_chunks:   File "doc.pyx", line 569, in noun_chunks ValueError: [E029] noun_chunks requires the dependency
> parse, which requires a statistical model to be installed and loaded.
> For more info, see the documentation: https://spacy.io/usage/models
> 
> Process finished with exit code 1

标签: python-3.xspacyjsonpickle

解决方案


该文档提供了这个问题的一个很好的例子。基本上,使用 pickle 库并注意整个 spacy doc 对象将被腌制 - 不仅仅是文本。然后,您的代码需要如下所示:

import spacy    
nlp = spacy.load('en_core_web_sm')    
text = "This is United States"
doc = nlp(text)
doc_data = pickle.dumps(doc)

在此处查找包含代码详细信息的示例: https ://spacy.io/usage/saving-loading#pickle

另一种选择是使用 doc.to_json()or doc.to_dict()并从那里进行一般序列化。


推荐阅读