首页 > 解决方案 > 应用模型后如何从sklearn导出数据集?

问题描述

我正在使用 sklearn/python 进行机器学习课程。我了解模型的预处理、选择和运行等,但现在我已经运行了数据,我不确定如何:

这是我的代码:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

dataset = pd.read_csv('test_dataset.csv')
dataset.set_index('ID', inplace=True) # replace ID with identifier field
X = dataset.iloc[:, 0:-1].values #input variables
y = dataset.iloc[:, -1].values #output variable (to predict)

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
poly_reg = PolynomialFeatures(degree = 4)
X_poly = poly_reg.fit_transform(X_train)
regressor = LinearRegression()
regressor.fit(X_poly, y_train)

y_pred = regressor.predict(poly_reg.transform(X_test))
np.set_printoptions(precision=2)
print(np.concatenate((y_pred.reshape(len(y_pred),1), y_test.reshape(len(y_test),1)),1))

标签: pythonmachine-learningscikit-learn

解决方案


选项:1 在这个例子中,如果你想预测一个特定的记录,那么你可以做这样的事情

some_data_for_predict = dataset[dataset['ID']==1].iloc[:, 0:-1].values
y_pred = regressor.predict(poly_reg.transform(some_data_for_predict))
print(f"actual: \n{dataset[dataset['ID']==1]} \ny_pred: \n{y_pred}")

选项:2 如果涉及数据预处理(例如处理丢失的数据、应用适当的编码、特征缩放),那么您可能最终会得到转换后编码的数据,在这种情况下,如果您想查看转换后的实际值,那么你可以使用inverse_transform. 就像是:

X_normalized = scaler.fit_transform(X)
X_train_norm, X_test_norm, y_train, y_test = train_test_split(X_normalized, y, test_size=0.2, random_state=42)
some_data = X_test_norm[:5]
regressor.predict(some_data)
scaler.inverse_transform(some_data) # this will give the actual data.


推荐阅读