首页 > 解决方案 > 使用 ColumnTransformer 向矢量化器内容添加功能,尝试拟合内容时出现尺寸错误

问题描述

我在向矢量化器内容添加功能时遇到问题。我有文本内容和页数,我正在使用这样的 ColumnTransformer sklearn 函数将页面添加到矢量化输入

training_content = pd.DataFrame({'text': training_text,'pages': training_pages})

文本内容和页面的尺寸相同

19872 19872

生成的 DataFrame 具有这种形状

(19872, 2)

然后我使用 ColumnTransformer 为特征预处理生成管道

pipe = ColumnTransformer([('text', TfidfVectorizer(tokenizer=remove_strings_smaller_three_chars_tokenizer,  ngram_range=(1,ngram)), ['text'])], remainder=MinMaxScaler())

pipe = pipe.fit(training_content)

但我收到了这个错误

Traceback (most recent call last):
  File "test_clfs.py", line 336, in <module>
    pipe = pipe.fit(training_content)
  File "/root/semantic_env/lib/python3.7/site-packages/sklearn/compose/_column_transformer.py", line 494, in fit
    self.fit_transform(X, y=y)
  File "/root/semantic_env/lib/python3.7/site-packages/sklearn/compose/_column_transformer.py", line 553, in fit_transform
    return self._hstack(list(Xs))
  File "/root/semantic_env/lib/python3.7/site-packages/sklearn/compose/_column_transformer.py", line 639, in _hstack
    return np.hstack(Xs)
  File "<__array_function__ internals>", line 6, in hstack
  File "/root/semantic_env/lib/python3.7/site-packages/numpy/core/shape_base.py", line 346, in hstack
    return _nx.concatenate(arrs, 1)
  File "<__array_function__ internals>", line 6, in concatenate
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 1 and the array at index 1 has size 19872

标签: scikit-learnpython-3.7

解决方案


推荐阅读