numpy - 训练模型时出现不兼容行维度的值错误
问题描述
我正在对数据集实施决策树。在此之前,我想使用CountVectorizer转换特定列。为此,我使用管道使其更简单。
但是存在行尺寸不兼容的错误。
代码
# Imported the libraries....
from sklearn.feature_extraction.text import CountVectorizer as cv
from sklearn.preprocessing import OneHotEncoder as ohe
from sklearn.compose import ColumnTransformer as ct
from sklearn.pipeline import make_pipeline as mp
from sklearn.tree import DecisionTreeClassifier as dtc
transformer=ct(transformers=[('review_counts',cv(),['verified_reviews']),
('variation_dummies', ohe(),['variation'])
],remainder='passthrough')
pipe= mp(transformer,dtc(random_state=42))
x= data[['rating','variation','verified_reviews']].copy()
y= data.feedback
x_train,x_test,y_train,y_test= tts(x,y,test_size=0.3,random_state=42,stratify=y)
print(x_train.shape,y_train.shape) # ((2205, 3), (2205,))
pipe.fit(x_train,y_train) # Error on this line
错误
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-79-a981c354b190> in <module>()
----> 1 pipe.fit(x_train,y_train)
7 frames
/usr/local/lib/python3.6/dist-packages/scipy/sparse/construct.py in bmat(blocks, format, dtype)
584 exp=brow_lengths[i],
585 got=A.shape[0]))
--> 586 raise ValueError(msg)
587
588 if bcol_lengths[j] == 0:
ValueError: blocks[0,:] has incompatible row dimensions. Got blocks[0,1].shape[0] == 2205, expected 1.
问题
- 这种行尺寸不兼容的错误是如何形成的?
- 如何解决?
解决方案
尝试将所需的列作为列表传递给 ohe,而将简单的字符串传递给 cv
from sklearn.feature_extraction.text import CountVectorizer as cv
from sklearn.preprocessing import OneHotEncoder as ohe
from sklearn.compose import ColumnTransformer as ct
from sklearn.pipeline import make_pipeline as mp
from sklearn.tree import DecisionTreeClassifier as dtc
data = pd.DataFrame({'rating':np.random.randint(0,10,6),'variation':['a','b','c','a','b','c'],
'verified_reviews':['adnf asd','sdf dsa','das j s','asd jd s','sad jds a','sajd'],
'feedback':np.random.randint(0,2,6)})
transformer=ct(transformers=[('review_counts',cv(),'verified_reviews'),
('variation_dummies', ohe(),['variation'])],
remainder='passthrough')
pipe= mp(transformer, dtc(random_state=42))
x= data[['rating','variation','verified_reviews']].copy()
y= data.feedback
pipe.fit(x,y)
根据文档,每当转换器需要一维数组作为输入时,这些列都被指定为字符串(“xxx”)。对于需要 2D 数据的转换器,我们需要将列指定为字符串列表 (["xxx"])。
推荐阅读
- .net - 在 openxml 创建的 powerpoint 部分(emf)中使用自定义字体
- c++ - 使用 std::function 和参数回调
- mysql - 如何使用嵌套查询更新行
- javascript - 如何在不同的输入中从模态中获取值
- c# - 将 XML 中的多项选择元素序列化为 C#
- c# - SpecFlow 未发现测试或我收到错误消息
- html - 如何使用基于 JSON 的下拉菜单来过滤 xml 源?
- google-sheets - IMPORTXML 在 NCBI 网站的特定页面上返回 #N/A - 受刮擦保护?
- reactjs - React 中的状态,如何保存原始的`?
- typescript - 扩展日期类型表示未找到函数