首页 > 解决方案 > ValueError:模型的特征数量必须与输入匹配。模型 n_features 为 11,输入 n_features 为 2

问题描述

在 jupyter notebook 中运行以下代码时,出现值错误。

ValueError:模型的特征数量必须与输入匹配。模型 n_features 为 11,输入 n_features 为 2

如何解决这个问题?

# Visualising the Training set results
from matplotlib.colors import ListedColormap
X_set, y_set = X_train, y_train
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
                     np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))
plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
             alpha = 0.75, cmap = ListedColormap(('red', 'green')))

我收到以下错误:

ValueError                                Traceback (most recent call last)
<ipython-input-42-bc13e66e79fe> in <module>
      4 X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
      5                      np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))
----> 6 plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
      7              alpha = 0.75, cmap = ListedColormap(('red', 'green')))
      8 plt.xlim(X1.min(), X1.max())

~\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py in predict(self, X)
    627             The predicted classes.
    628         """
--> 629         proba = self.predict_proba(X)
    630 
    631         if self.n_outputs_ == 1:

~\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py in predict_proba(self, X)
    671         check_is_fitted(self)
    672         # Check data
--> 673         X = self._validate_X_predict(X)
    674 
    675         # Assign chunk of trees to jobs

~\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py in _validate_X_predict(self, X)
    419         check_is_fitted(self)
    420 
--> 421         return self.estimators_[0]._validate_X_predict(X, check_input=True)
    422 
    423     @property

~\anaconda3\lib\site-packages\sklearn\tree\_classes.py in _validate_X_predict(self, X, check_input)
    394         n_features = X.shape[1]
    395         if self.n_features_ != n_features:
--> 396             raise ValueError("Number of features of the model must "
    397                              "match the input. Model n_features is %s and "
    398                              "input n_features is %s "

ValueError: Number of features of the model must match the input. Model n_features is 11 and input n_features is 2 

完整模型代码:https ://github.com/anandsinha07/Placement-prediction-system-using-ML-algos/blob/master/PREDICTION-Random%20Forest%20Classification/random_forest_classification.py

标签: pythonnumpymachine-learningjupyter-notebookdata-science

解决方案


我会以我理解问题的方式修复您的代码,添加了几行额外的代码。主要问题是您只提供第 1 列和第 2 列进行预测,但预测器需要 11 列 1-11。因此,应该以某种方式填充第 3-11 列。至少你可以用零填充它们。

在我的解决方案中,我按第一列对训练集进行了排序,然后在构建网格网格时,我尝试通过从网格网格中找到值接近 X1 的最接近的第 1 列值来近似预测所需的第 3-11 列。即,我试图找到仅给定第 1 列的第 3-11 列的最佳近似值,这只是不要用零填充第 3-11 列,这也可以做到。

我还评论了行#from sklearn.cross_validation import train_test_split并将其替换为from sklearn.model_selection import train_test_split因为第一行使用旧的 sklearn 库,在新版本中只有第二行有效,子模块名称已更改。为自己选择此行的正确变体。

# Random Forest Classification

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('finalplacementdata3.csv')
X = dataset.iloc[:, range(1, 12)].values
y = dataset.iloc[:, 12].values

siX = np.lexsort((X[:, 1], X[:, 0]))
sX, sy = X[siX], y[siX]

# Splitting the dataset into the Training set and Test set
#from sklearn.cross_validation import train_test_split
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)

# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

# Fitting Random Forest Classification to the Training set
from sklearn.ensemble import RandomForestClassifier
classifier = RandomForestClassifier(n_estimators = 10, criterion = 'entropy', random_state = 0)
classifier.fit(X_train, y_train)

# Predicting the Test set results
y_pred = classifier.predict(X_test)

# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)

# Visualising the Training set results
from matplotlib.colors import ListedColormap
X_set, y_set = X_train, y_train
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
                     np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))
                     
riX = np.minimum(sX.shape[0] - 1, np.searchsorted(sX[:, 0], X1.ravel()))
rX = X[riX]

plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()] + list(rX[:, 2:].T)).T).reshape(X1.shape),
             alpha = 0.75, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
    plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
                c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('Random Forest Classification (Training set)')
plt.xlabel('Quants')
plt.ylabel('CGPA')
plt.legend()
plt.show()

# Visualising the Test set results
from matplotlib.colors import ListedColormap
X_set, y_set = X_test, y_test
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
                     np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))

riX = np.minimum(sX.shape[0] - 1, np.searchsorted(sX[:, 0], X1.ravel()))
rX = X[riX]

plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()] + list(rX[:, 2:].T)).T).reshape(X1.shape),
             alpha = 0.75, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
    plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
                c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('Random Forest Classification (Test set)')
plt.xlabel('Quants')
plt.ylabel('CGPA')
plt.legend()
plt.show()

推荐阅读