python - ValueError:模型的特征数量必须与输入匹配。模型 n_features 为 11,输入 n_features 为 2
问题描述
在 jupyter notebook 中运行以下代码时,出现值错误。
ValueError:模型的特征数量必须与输入匹配。模型 n_features 为 11,输入 n_features 为 2
如何解决这个问题?
# Visualising the Training set results
from matplotlib.colors import ListedColormap
X_set, y_set = X_train, y_train
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))
plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
alpha = 0.75, cmap = ListedColormap(('red', 'green')))
我收到以下错误:
ValueError Traceback (most recent call last)
<ipython-input-42-bc13e66e79fe> in <module>
4 X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
5 np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))
----> 6 plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
7 alpha = 0.75, cmap = ListedColormap(('red', 'green')))
8 plt.xlim(X1.min(), X1.max())
~\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py in predict(self, X)
627 The predicted classes.
628 """
--> 629 proba = self.predict_proba(X)
630
631 if self.n_outputs_ == 1:
~\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py in predict_proba(self, X)
671 check_is_fitted(self)
672 # Check data
--> 673 X = self._validate_X_predict(X)
674
675 # Assign chunk of trees to jobs
~\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py in _validate_X_predict(self, X)
419 check_is_fitted(self)
420
--> 421 return self.estimators_[0]._validate_X_predict(X, check_input=True)
422
423 @property
~\anaconda3\lib\site-packages\sklearn\tree\_classes.py in _validate_X_predict(self, X, check_input)
394 n_features = X.shape[1]
395 if self.n_features_ != n_features:
--> 396 raise ValueError("Number of features of the model must "
397 "match the input. Model n_features is %s and "
398 "input n_features is %s "
ValueError: Number of features of the model must match the input. Model n_features is 11 and input n_features is 2
解决方案
我会以我理解问题的方式修复您的代码,添加了几行额外的代码。主要问题是您只提供第 1 列和第 2 列进行预测,但预测器需要 11 列 1-11。因此,应该以某种方式填充第 3-11 列。至少你可以用零填充它们。
在我的解决方案中,我按第一列对训练集进行了排序,然后在构建网格网格时,我尝试通过从网格网格中找到值接近 X1 的最接近的第 1 列值来近似预测所需的第 3-11 列。即,我试图找到仅给定第 1 列的第 3-11 列的最佳近似值,这只是不要用零填充第 3-11 列,这也可以做到。
我还评论了行#from sklearn.cross_validation import train_test_split
并将其替换为from sklearn.model_selection import train_test_split
因为第一行使用旧的 sklearn 库,在新版本中只有第二行有效,子模块名称已更改。为自己选择此行的正确变体。
# Random Forest Classification
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('finalplacementdata3.csv')
X = dataset.iloc[:, range(1, 12)].values
y = dataset.iloc[:, 12].values
siX = np.lexsort((X[:, 1], X[:, 0]))
sX, sy = X[siX], y[siX]
# Splitting the dataset into the Training set and Test set
#from sklearn.cross_validation import train_test_split
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
# Fitting Random Forest Classification to the Training set
from sklearn.ensemble import RandomForestClassifier
classifier = RandomForestClassifier(n_estimators = 10, criterion = 'entropy', random_state = 0)
classifier.fit(X_train, y_train)
# Predicting the Test set results
y_pred = classifier.predict(X_test)
# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
# Visualising the Training set results
from matplotlib.colors import ListedColormap
X_set, y_set = X_train, y_train
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))
riX = np.minimum(sX.shape[0] - 1, np.searchsorted(sX[:, 0], X1.ravel()))
rX = X[riX]
plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()] + list(rX[:, 2:].T)).T).reshape(X1.shape),
alpha = 0.75, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('Random Forest Classification (Training set)')
plt.xlabel('Quants')
plt.ylabel('CGPA')
plt.legend()
plt.show()
# Visualising the Test set results
from matplotlib.colors import ListedColormap
X_set, y_set = X_test, y_test
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))
riX = np.minimum(sX.shape[0] - 1, np.searchsorted(sX[:, 0], X1.ravel()))
rX = X[riX]
plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()] + list(rX[:, 2:].T)).T).reshape(X1.shape),
alpha = 0.75, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('Random Forest Classification (Test set)')
plt.xlabel('Quants')
plt.ylabel('CGPA')
plt.legend()
plt.show()
推荐阅读
- node.js - 如何在 oracledb 中使用 column.nextval
- c# - 在 .Net Core 项目中使用 AutoMapper 时,ForMember 设置不起作用
- java - 在 Java 中安全地创建 hwid
- python - 如何在div中获取文本
- javascript - highstock 如何表示大于屏幕分辨率的点数?
- c# - 将一个类中的数据传递给另一个类中的函数
- javascript - decode-jpeg2000.js:15 未捕获类型错误:无法读取 decode-jpeg2000.js:15 处未定义的属性“0”
- javascript - 没有处理程序的 addEventListener
- c++ - 如何自定义模板以不包含某些类型
- c++ - MEX C++ 原始数据访问