首页 > 解决方案 > AttributeError:“tensorflow.python.framework.ops.EagerTensor”对象没有属性“to_tensor”

问题描述

我正在使用 Hugging Face、Keras、Tensorflow 库对 BERT 模型进行微调。

从昨天开始,我在 Google Colab 中运行我的代码时遇到了这个错误。奇怪的是,以前运行的代码没有任何问题,突然开始抛出这个错误。更令人怀疑的是,代码在我的 Apple M1 tensorflow 配置中运行没有问题。同样,我没有对我的代码进行任何更改,但现在代码无法在 Google Colab 中运行,尽管它过去运行时没有任何问题。

两种环境都有 tensorflow 2.6.0

error_screenshot

我创建了下面的代码以重现错误。我希望你能对此有所了解。

!pip install transformers
!pip install datasets

import pandas as pd
import numpy as np
import tensorflow as tf
from transformers import AutoTokenizer
from datasets import Dataset

# dummy sentences
sentences = ['the house is blue and big', 'this is fun stuff','what a horrible thing to say']

# create a pandas dataframe and converto to Hugging Face dataset
df = pd.DataFrame({'Text': sentences})
dataset = Dataset.from_pandas(df)

#download bert tokenizer
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')

# tokenize each sentence in dataset
dataset_tok = dataset.map(lambda x: tokenizer(x['Text'], truncation=True, padding=True, max_length=10), batched=True)

# remove original text column and set format
dataset_tok = dataset_tok.remove_columns(['Text']).with_format('tensorflow')

# extract features
features = {x: dataset_tok[x].to_tensor() for x in tokenizer.model_input_names}

标签: pythontensorflowgoogle-colaboratoryhuggingface-transformershuggingface-tokenizers

解决方案


删除to_tensor()给定代码后,按照@Harold G 的建议工作。

!pip install transformers
!pip install datasets

import pandas as pd
import numpy as np
import tensorflow as tf
from transformers import AutoTokenizer
from datasets import Dataset

# dummy sentences
sentences = ['the house is blue and big', 'this is fun stuff','what a horrible thing to say']

# create a pandas dataframe and converto to Hugging Face dataset
df = pd.DataFrame({'Text': sentences})
dataset = Dataset.from_pandas(df)

#download bert tokenizer
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')

# tokenize each sentence in dataset
dataset_tok = dataset.map(lambda x: tokenizer(x['Text'], truncation=True, padding=True, max_length=10), batched=True)

# remove original text column and set format
dataset_tok = dataset_tok.remove_columns(['Text']).with_format('tensorflow')

# extract features
features = {x: dataset_tok[x] for x in tokenizer.model_input_names}

推荐阅读