首页 > 解决方案 > python - 交叉验证后如何使用“测试”数据集?

问题描述

我是 Python 的新手,我需要帮助。我在我的“测试数据集(我的数据集的 60%)”上应用了交叉验证,现在我试图找到如何在我的数据集的其余部分(测试数据集 - 40%)上测试我的分类器。我使用了以下代码:

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('NvDataSet.csv', sep=';')
dataset = dataset.dropna()
print(dataset.info())
#dataset = pd.read_csv('Urban1.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:,76].values

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.4, random_state = 0)

# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

# Fitting SVM to the Training set
from sklearn.svm import SVC
classifier = SVC(kernel = 'rbf', random_state = 0)
classifier.fit(X_train, y_train)

# Predicting the Test set results
y_pred = classifier.predict(X_test)

#Making the accuracy
from sklearn.metrics import accuracy_score
print(accuracy_score(y_test, y_pred))

# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)

我在交叉验证之前观察了结果:“准确率:95,97%”,之后我在我的测试数据集上应用了交叉验证功能。

from sklearn.model_selection import cross_val_score
accuracies= cross_val_score(estimator=classifier, X= X_train, y= y_train, cv= 10)
accuracies.mean()

“交叉验证平均准确率 = 93.58%”

我现在应该怎么做才能给我用交叉验证技术测试的分类器测试数据集:X_test 和 y_test?!

y_pred = classifier.predict(X_test)

它在交叉验证应用程序之前给出了相同的结果,准确度 = 95.97% 没有改变?

标签: pythonmachine-learning

解决方案


推荐阅读