首页 > 解决方案 > 在一个循环(loop)中创建一个虚拟变量

问题描述

我正在处理由 22 列和 129 行组成的数据集。

我正在使用支持向量机来预测我的因变量。

为此,我将变量拆分为假设 0 和 1 的虚拟变量:

df['dummy_medianrat'] = df['median_rating'].apply(lambda x: 1 if x < 13 else 0)

现在,我的回答是:

我想在循环中生成这个假人,例如:

df['dummy_medianrat'] = df['median_rating'].apply(lambda x: 1 if x < 12 else 0)
df['dummy_medianrat'] = df['median_rating'].apply(lambda x: 1 if x < 5 else 0)
df['dummy_medianrat'] = df['median_rating'].apply(lambda x: 1 if x < 8 else 0)

等等。我想用不同的分类(<12、<5、<8)测试我的变量,并允许 SVM 测试所有这些。

完整代码:

import pandas as pd # pandas is used to load and manipulate data and for One-Hot Encoding
import numpy as np # data manipulation
import matplotlib.pyplot as plt # matplotlib is for drawing graphs
import matplotlib.colors as colors
from sklearn.utils import resample # downsample the dataset
from sklearn.model_selection import train_test_split # split  data into training and testing sets
from sklearn import preprocessing # scale and center data
from sklearn.svm import SVC # this will make a support vector machine for classificaiton
from sklearn.model_selection import GridSearchCV # this will do cross validation
from sklearn.metrics import confusion_matrix # this creates a confusion matrix
from sklearn.metrics import plot_confusion_matrix # draws a confusion matrix
from sklearn.decomposition import PCA # to perform PCA to plot the data
from sklearn import svm, datasets

    datafile = (r'C:\Users\gpont\PycharmProjects\pythonProject2\data\Map\databaseCDP0.csv')

df = pd.read_csv(datafile, skiprows = 0, sep=';')

df['dummy_medianrat'] = df['median_rating'].apply(lambda x: 1 if x < 13 else 0)

#Splitting data in two datasets

df_lowr = df[df['dummy_medianrat'] == 1]
df_higr = df[df['dummy_medianrat'] == 0]

df_downsample = pd.concat([df_lowr, df_higr])
len(df_downsample)

X = df_downsample.drop('dummy_medianrat', axis=1).copy()
X.head()

y = df_downsample['dummy_medianrat'].copy()
y.head()

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42,
                                                   test_size=0.25)
scaler = preprocessing.StandardScaler().fit(X_train)
X_train_scaled = scaler.transform(X_train)
X_test_scaled = scaler.transform(X_test)


X_train.shape

X_test.shape

#Build A Preliminary Support Vector Machine
#We don't need to scale y_traing because is 0, 1 (binary classification)

clf_svm = SVC(random_state=42)
clf_svm.fit(X_train_scaled, y_train)

titles_options = [("Confusion matrix, without normalization", None),
                  ("Normalized confusion matrix", 'true')]
for title, normalize in titles_options:
    disp = plot_confusion_matrix(clf_svm, X_test_scaled, y_test,
                                 display_labels=["Did not default", "Defaulted"],
                                 cmap=plt.cm.Blues,
                                 normalize=normalize)
    disp.ax_.set_title(title)

    print(title)
    print(disp.confusion_matrix)

在创建了一些具有不同值的虚拟对象后,我想为循环中创建的每个虚拟对象生成两个混淆矩阵(标准化和非标准化)。

标签: pythonscikit-learnsvm

解决方案


推荐阅读