首页 > 解决方案 > ValueError: blocks[0,:] 的行尺寸不兼容

问题描述

我试图从 twitter 数据集中提取一些文本特征(word_count.char_count...)和 tf-idf 以进行情感分析。使用 sklearn 的 featureUnion 将它们组合起来,并将它们提供给 Pipeline 中的分类器。

我收到以下错误ValueError: blocks[0,:] has incompatible row dimensions。得到 blocks[0,8].shape[0] == 7920,预期为 1。这是代码:

features_union = FeatureUnion(transformer_list = [('word_count', WordCalculator()),
                                                  ('char_count', CharCalculator()),
                                                  ('avg_word_len', AvdWordLengthCalculater()),
                                                  ('stop_words_count', StopWordsCalculater()),
                                                  ('spl_char_count', SplCharCalculater()),
                                                  ('hash_tag_count', HashTagCalculator()),
                                                  ('num_count',NumericsCalculator()),
                                                  ('cap_letter_count',CapsCalculator()),
                                                  ('tfidf_feature',Pipeline([('preprocessor', Preprocessor()),
                                                                             ('selector', ItemSelector('tweet')),
                                                                             ('count', CountVectorizer()),
                                                                             ('tfidf', TfidfTransformer())]))])
pipeline = Pipeline([('noise_remover', UrlRemover()),
                     ('features', features_union),
                     ('model', MultinomialNB())])
pipeline.fit(train, train['label'])```

这是完整的错误日志

ValueError                                Traceback (most recent call last)
<ipython-input-33-bb532fc90bb0> in <module>
     14                      ('features', features_union),
     15                      ('model', MultinomialNB())])
---> 16 pipeline.fit(train, train['label'])

~/opt/anaconda3/lib/python3.7/site-packages/sklearn/pipeline.py in fit(self, X, y, **fit_params)
    348             This estimator
    349         """
--> 350         Xt, fit_params = self._fit(X, y, **fit_params)
    351         with _print_elapsed_time('Pipeline',
    352                                  self._log_message(len(self.steps) - 1)):

~/opt/anaconda3/lib/python3.7/site-packages/sklearn/pipeline.py in _fit(self, X, y, **fit_params)
    313                 message_clsname='Pipeline',
    314                 message=self._log_message(step_idx),
--> 315                 **fit_params_steps[name])
    316             # Replace the transformer of the step with the fitted
    317             # transformer. This is necessary when loading the transformer

~/opt/anaconda3/lib/python3.7/site-packages/joblib/memory.py in __call__(self, *args, **kwargs)
    353 
    354     def __call__(self, *args, **kwargs):
--> 355         return self.func(*args, **kwargs)
    356 
    357     def call_and_shelve(self, *args, **kwargs):

~/opt/anaconda3/lib/python3.7/site-packages/sklearn/pipeline.py in _fit_transform_one(transformer, X, y, weight, message_clsname, message, **fit_params)
    726     with _print_elapsed_time(message_clsname, message):
    727         if hasattr(transformer, 'fit_transform'):
--> 728             res = transformer.fit_transform(X, y, **fit_params)
    729         else:
    730             res = transformer.fit(X, y, **fit_params).transform(X)

~/opt/anaconda3/lib/python3.7/site-packages/sklearn/pipeline.py in fit_transform(self, X, y, **fit_params)
    943 
    944         if any(sparse.issparse(f) for f in Xs):
--> 945             Xs = sparse.hstack(Xs).tocsr()
    946         else:
    947             Xs = np.hstack(Xs)

~/opt/anaconda3/lib/python3.7/site-packages/scipy/sparse/construct.py in hstack(blocks, format, dtype)
    463 
    464     """
--> 465     return bmat([blocks], format=format, dtype=dtype)
    466 
    467 

~/opt/anaconda3/lib/python3.7/site-packages/scipy/sparse/construct.py in bmat(blocks, format, dtype)
    584                                                     exp=brow_lengths[i],
    585                                                     got=A.shape[0]))
--> 586                     raise ValueError(msg)
    587 
    588                 if bcol_lengths[j] == 0:

ValueError: blocks[0,:] has incompatible row dimensions. Got blocks[0,8].shape[0] == 7920, expected 1.

数据集样本:

0   1   0   #fingerprint #Pregnancy Test https://google.com...
1   2   0   Finally a transparant silicon case ^^ Thanks t...
2   3   0   We love this! Would you go? #talk #makememorie...
3   4   0   I'm wired I know I'm George I was made that wa...
4   5   1   What amazing service! Apple won't even talk to...

数据集形状 - (7920, 3)

任何对此的直接帮助将不胜感激。

标签: pythonmachine-learningscikit-learnsentiment-analysisfeature-extraction

解决方案


推荐阅读