首页 > 解决方案 > TextBlob 错误:要解压的值太多

问题描述

我正在尝试运行以下代码,但我收到了一个错误,即太多值无法解包

代码是:

import csv
import json
import pandas as pd

df = pd.read_csv("job/my_data_frame_test.csv", encoding="utf-8")

df.info()
print(df)
文本 文字推荐
美国广播公司 是的
国防军
from textblob import TextBlob
    
from textblob.classifiers import NaiveBayesClassifier
    
cl = NaiveBayesClassifier(df)

运行此代码后,我有以下错误(完整)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-7-3d683b8c482a> in <module>
----> 1 cl = NaiveBayesClassifier(df)

/usr/local/lib/python3.8/dist-packages/textblob/classifiers.py in __init__(self, train_set, feature_extractor, format, **kwargs)
    203     def __init__(self, train_set,
    204                  feature_extractor=basic_extractor, format=None, **kwargs):
--> 205         super(NLTKClassifier, self).__init__(train_set, feature_extractor, format, **kwargs)
    206         self.train_features = [(self.extract_features(d), c) for d, c in self.train_set]
    207 

/usr/local/lib/python3.8/dist-packages/textblob/classifiers.py in __init__(self, train_set, feature_extractor, format, **kwargs)
    137         else:  # train_set is a list of tuples
    138             self.train_set = train_set
--> 139         self._word_set = _get_words_from_dataset(self.train_set)  # Keep a hidden set of unique words.
    140         self.train_features = None
    141 

/usr/local/lib/python3.8/dist-packages/textblob/classifiers.py in _get_words_from_dataset(dataset)
     61             return words
     62     all_words = chain.from_iterable(tokenize(words) for words, _ in dataset)
---> 63     return set(all_words)
     64 
     65 def _get_document_tokens(document):

/usr/local/lib/python3.8/dist-packages/textblob/classifiers.py in <genexpr>(.0)
     60         else:
     61             return words
---> 62     all_words = chain.from_iterable(tokenize(words) for words, _ in dataset)
     63     return set(all_words)
     64 

ValueError: too many values to unpack (expected 2)

标签: pythonpandasdataframetext-miningtextblob

解决方案


NaiveBayesClassifier() 需要以下形式 的元组列表(text, label)

train = list(zip(df['TEXT'], df['text recommended']))
# [('ABC', 'yes'), ('DEF', 'no')]
cl = NaiveBayesClassifier(train)
# <NaiveBayesClassifier trained on 2 instances>

推荐阅读