python - How to make prediction from train Pytorch and PytorchText model?
问题描述
General speaking, after I have successfully trained a text RNN model with Pytorch, using PytorchText to leverage data loading on an origin source, I would like to test with other data sets (a sort of blink test) that are from different sources but the same text format.
First I defined a class to handle the data loading.
class Dataset(object):
def __init__(self, config):
# init what I need
def load_data(self, df: pd.DataFrame, *args):
# implementation below
# Data format like `(LABEL, TEXT)`
def load_data_but_error(self, df: pd.DataFrame):
# implementation below
# Data format like `(TEXT)`
Here is the detail of load_data
which I load data that trained successfully.
TEXT = data.Field(sequential=True, tokenize=tokenizer, lower=True, fix_length=self.config.max_sen_len)
LABEL = data.Field(sequential=False, use_vocab=False)
datafields = [(label_col, LABEL), (data_col, TEXT)]
# split my data to train/test
train_df, test_df = train_test_split(df, test_size=0.33, random_state=random_state)
train_examples = [data.Example.fromlist(i, datafields) for i in train_df.values.tolist()]
train_data = data.Dataset(train_examples, datafields)
# split train to train/val
train_data, val_data = train_data.split(split_ratio=0.8)
# build vocab
TEXT.build_vocab(train_data, vectors=Vectors(w2v_file))
self.word_embeddings = TEXT.vocab.vectors
self.vocab = TEXT.vocab
test_examples = [data.Example.fromlist(i, datafields) for i in test_df.values.tolist()]
test_data = data.Dataset(test_examples, datafields)
self.train_iterator = data.BucketIterator(
(train_data),
batch_size=self.config.batch_size,
sort_key=lambda x: len(x.title),
repeat=False,
shuffle=True)
self.val_iterator, self.test_iterator = data.BucketIterator.splits(
(val_data, test_data),
batch_size=self.config.batch_size,
sort_key=lambda x: len(x.title),
repeat=False,
shuffle=False)
Next is my code (load_data_but_error
) to load others source but causing error
TEXT = data.Field(sequential=True, tokenize=tokenizer, lower=True, fix_length=self.config.max_sen_len)
datafields = [('title', TEXT)]
examples = [data.Example.fromlist(i, datafields) for i in df.values.tolist()]
blink_test = data.Dataset(examples, datafields)
self.blink_test = data.BucketIterator(
(blink_test),
batch_size=self.config.batch_size,
sort_key=lambda x: len(x.title),
repeat=False,
shuffle=True)
When I was executing code, I had an error AttributeError: 'Field' object has no attribute 'vocab'
which has a question at here but it doesn't like my situation as here I had vocab from load_data
and I want to use it for blink tests.
My question is what the correct way to load and feed new data with a trained PyTorch model for testing current model is?
解决方案
我需要的是
- 通过分配给类变量来保留
TEXT
和load_data
重用load_data_but_error
- 添加到函数上
train=True
的对象data.BucketIterator
load_data_but_error
推荐阅读
- python - 如何从输出列表中删除名称和数据类型?
- function - 在不通过引用传递的情况下更新父范围变量
- swift - 我刚开始学习 Swift,我在这个 Lambda School 预科课程中弄错了这个问题
- c++ - Valgrind 报告嵌套 shared_ptrs 的 =operator 读取错误
- c# - 将 ViewModel 传递给 Controller 不能完全正常工作
- bash - 服务关闭时重新尝试 docker-compose logs -f 吗?
- wagtail - 在 Wagtail 流场模板中,如何检查结构块内结构块的字段是否为空?
- javascript - Android 不提供 Geolocation ionic 4
- c++ - 将 '\\x00\\x00\\x00' 格式的字符串转换为无符号字符数组
- android - 在 RecycleView 中滑动以删除