python - ValueError: blocks[0,:] 的行尺寸不兼容
问题描述
我试图从 twitter 数据集中提取一些文本特征(word_count.char_count...)和 tf-idf 以进行情感分析。使用 sklearn 的 featureUnion 将它们组合起来,并将它们提供给 Pipeline 中的分类器。
我收到以下错误ValueError: blocks[0,:] has incompatible row dimensions。得到 blocks[0,8].shape[0] == 7920,预期为 1。这是代码:
features_union = FeatureUnion(transformer_list = [('word_count', WordCalculator()),
('char_count', CharCalculator()),
('avg_word_len', AvdWordLengthCalculater()),
('stop_words_count', StopWordsCalculater()),
('spl_char_count', SplCharCalculater()),
('hash_tag_count', HashTagCalculator()),
('num_count',NumericsCalculator()),
('cap_letter_count',CapsCalculator()),
('tfidf_feature',Pipeline([('preprocessor', Preprocessor()),
('selector', ItemSelector('tweet')),
('count', CountVectorizer()),
('tfidf', TfidfTransformer())]))])
pipeline = Pipeline([('noise_remover', UrlRemover()),
('features', features_union),
('model', MultinomialNB())])
pipeline.fit(train, train['label'])```
这是完整的错误日志
ValueError Traceback (most recent call last)
<ipython-input-33-bb532fc90bb0> in <module>
14 ('features', features_union),
15 ('model', MultinomialNB())])
---> 16 pipeline.fit(train, train['label'])
~/opt/anaconda3/lib/python3.7/site-packages/sklearn/pipeline.py in fit(self, X, y, **fit_params)
348 This estimator
349 """
--> 350 Xt, fit_params = self._fit(X, y, **fit_params)
351 with _print_elapsed_time('Pipeline',
352 self._log_message(len(self.steps) - 1)):
~/opt/anaconda3/lib/python3.7/site-packages/sklearn/pipeline.py in _fit(self, X, y, **fit_params)
313 message_clsname='Pipeline',
314 message=self._log_message(step_idx),
--> 315 **fit_params_steps[name])
316 # Replace the transformer of the step with the fitted
317 # transformer. This is necessary when loading the transformer
~/opt/anaconda3/lib/python3.7/site-packages/joblib/memory.py in __call__(self, *args, **kwargs)
353
354 def __call__(self, *args, **kwargs):
--> 355 return self.func(*args, **kwargs)
356
357 def call_and_shelve(self, *args, **kwargs):
~/opt/anaconda3/lib/python3.7/site-packages/sklearn/pipeline.py in _fit_transform_one(transformer, X, y, weight, message_clsname, message, **fit_params)
726 with _print_elapsed_time(message_clsname, message):
727 if hasattr(transformer, 'fit_transform'):
--> 728 res = transformer.fit_transform(X, y, **fit_params)
729 else:
730 res = transformer.fit(X, y, **fit_params).transform(X)
~/opt/anaconda3/lib/python3.7/site-packages/sklearn/pipeline.py in fit_transform(self, X, y, **fit_params)
943
944 if any(sparse.issparse(f) for f in Xs):
--> 945 Xs = sparse.hstack(Xs).tocsr()
946 else:
947 Xs = np.hstack(Xs)
~/opt/anaconda3/lib/python3.7/site-packages/scipy/sparse/construct.py in hstack(blocks, format, dtype)
463
464 """
--> 465 return bmat([blocks], format=format, dtype=dtype)
466
467
~/opt/anaconda3/lib/python3.7/site-packages/scipy/sparse/construct.py in bmat(blocks, format, dtype)
584 exp=brow_lengths[i],
585 got=A.shape[0]))
--> 586 raise ValueError(msg)
587
588 if bcol_lengths[j] == 0:
ValueError: blocks[0,:] has incompatible row dimensions. Got blocks[0,8].shape[0] == 7920, expected 1.
数据集样本:
0 1 0 #fingerprint #Pregnancy Test https://google.com...
1 2 0 Finally a transparant silicon case ^^ Thanks t...
2 3 0 We love this! Would you go? #talk #makememorie...
3 4 0 I'm wired I know I'm George I was made that wa...
4 5 1 What amazing service! Apple won't even talk to...
数据集形状 - (7920, 3)
任何对此的直接帮助将不胜感激。
解决方案
推荐阅读
- swagger - 将“条目”字段添加到组件的属性会破坏编译器
- android - 更改设备名称后多次启动外设广告
- python - Python 3 - 解析 .json
- deno - Deno - 将 TypeScript 导入 JS 文件
- python - 使用 mozilla/TTS 是否可以使用其他预训练的模型声音?
- c - 处理任何文件输入/输出时出现分段错误
- javascript - Firebase 和 JavaScript 安全问题?
- python - 根据课程ID更新django中的布尔值状态
- python - python flask wtf形式从函数返回值
- sql - 如何将左连接与第二个选择语句一起使用?