python-3.x - 特定域的 PoS 标记器模型
问题描述
我正在尝试在 spaCy v3.1 中构建一个带有.pos_
特定域属性的标记器模型。下面的代码设法编译,但是,它没有返回.pos_
属性。我怎样才能提取它们?
import plac
import random
from pathlib import Path
import spacy
from spacy.training import Example
TAG_MAP = {
'N': {'pos': 'NOUN'},
'V': {'pos': 'VERB'},
'J': {'pos': 'ADJ'}
}
TRAIN_DATA = [
('Eu gosto ovos cozidos', {'tags': ['N', 'V', 'N', 'J']}),
('Comer presunto azul', {'tags': ['V', 'N', 'J']})
]
@plac.annotations(
lang=("ISO Code of language to use", "option", "1", str),
output_dir=("Optional output directory", "option", "o", Path),
n_iter=("Number of training iterations", "option", "n", int),)
def main(lang="pt", output_dir="./output_2", n_iter=25):
"""Main function to create a new model, set up the pipeline and train
the tagger. In order to train the tagger with a custom tag map,
we're creating a new Language instance with a custom vocab.
"""
nlp = spacy.blank(lang)
tagger = nlp.add_pipe("tagger")
for tag, values in TAG_MAP.items():
tagger.add_label(tag) # tagger.add_label(tag, values) -> gives erro
optimizer = nlp.begin_training()
#optimizer = nlp.initialize()
for i in range(n_iter):
random.shuffle(TRAIN_DATA)
losses = {}
for text, annotations in TRAIN_DATA:
example = Example.from_dict(nlp.make_doc(text), annotations)
nlp.update([example], sgd=optimizer, losses=losses)
print(losses)
test_text = "Eu gosto ovos passados"
# Save model to output directory
if output_dir is not None:
output_dir = Path(output_dir)
if not output_dir.exists():
output_dir.mkdir()
nlp.to_disk(output_dir)
print("Saved model to", output_dir)
# test the save model
print("Loading from", output_dir)
nlp2 = spacy.load(output_dir)
doc = nlp2(test_text)
print("Tags", [(t.text, t.tag_, t.pos_) for t in doc])
if __name__ == "__main__":
plac.call(main)
最后print
返回:
Tags [('Eu', 'N', ''), ('gosto', 'V', ''), ('ovos', 'N', ''), ('passados', 'J', '')]
解决方案
推荐阅读
- javascript - 一些网站想在我的 chrome 中自动选择输入单选
- php - Serverload 使用带有 mail() 的自己的 postfix 与带有外部 smtp 的 phpmailer
- android - Android 警报对话框中的 EditText
- php - 如何以两种语言根据运输类别显示产品的可用性
- c# - SqlServerCompact 4.0 无法在 WPF App 中使用实体框架的所有功能
- css - CSS 伪元素在引导选项卡中重建
- python - 熊猫 to_datetime 意外更改年份
- android - 如何将我的 pwa 启动到我的 Android 手机
- postgresql - 如何在 Postgres 中打印错误消息和行号,如 DBMS_UTILITY.FORMAT_ERROR_BACKTRACE 在 Oracle 上
- ios - NSURLSessionDownloadTask 即使我杀了应用程序也会继续下载