python - 在一个循环(loop)中创建一个虚拟变量
问题描述
我正在处理由 22 列和 129 行组成的数据集。
我正在使用支持向量机来预测我的因变量。
为此,我将变量拆分为假设 0 和 1 的虚拟变量:
df['dummy_medianrat'] = df['median_rating'].apply(lambda x: 1 if x < 13 else 0)
现在,我的回答是:
我想在循环中生成这个假人,例如:
df['dummy_medianrat'] = df['median_rating'].apply(lambda x: 1 if x < 12 else 0)
df['dummy_medianrat'] = df['median_rating'].apply(lambda x: 1 if x < 5 else 0)
df['dummy_medianrat'] = df['median_rating'].apply(lambda x: 1 if x < 8 else 0)
等等。我想用不同的分类(<12、<5、<8)测试我的变量,并允许 SVM 测试所有这些。
完整代码:
import pandas as pd # pandas is used to load and manipulate data and for One-Hot Encoding
import numpy as np # data manipulation
import matplotlib.pyplot as plt # matplotlib is for drawing graphs
import matplotlib.colors as colors
from sklearn.utils import resample # downsample the dataset
from sklearn.model_selection import train_test_split # split data into training and testing sets
from sklearn import preprocessing # scale and center data
from sklearn.svm import SVC # this will make a support vector machine for classificaiton
from sklearn.model_selection import GridSearchCV # this will do cross validation
from sklearn.metrics import confusion_matrix # this creates a confusion matrix
from sklearn.metrics import plot_confusion_matrix # draws a confusion matrix
from sklearn.decomposition import PCA # to perform PCA to plot the data
from sklearn import svm, datasets
datafile = (r'C:\Users\gpont\PycharmProjects\pythonProject2\data\Map\databaseCDP0.csv')
df = pd.read_csv(datafile, skiprows = 0, sep=';')
df['dummy_medianrat'] = df['median_rating'].apply(lambda x: 1 if x < 13 else 0)
#Splitting data in two datasets
df_lowr = df[df['dummy_medianrat'] == 1]
df_higr = df[df['dummy_medianrat'] == 0]
df_downsample = pd.concat([df_lowr, df_higr])
len(df_downsample)
X = df_downsample.drop('dummy_medianrat', axis=1).copy()
X.head()
y = df_downsample['dummy_medianrat'].copy()
y.head()
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42,
test_size=0.25)
scaler = preprocessing.StandardScaler().fit(X_train)
X_train_scaled = scaler.transform(X_train)
X_test_scaled = scaler.transform(X_test)
X_train.shape
X_test.shape
#Build A Preliminary Support Vector Machine
#We don't need to scale y_traing because is 0, 1 (binary classification)
clf_svm = SVC(random_state=42)
clf_svm.fit(X_train_scaled, y_train)
titles_options = [("Confusion matrix, without normalization", None),
("Normalized confusion matrix", 'true')]
for title, normalize in titles_options:
disp = plot_confusion_matrix(clf_svm, X_test_scaled, y_test,
display_labels=["Did not default", "Defaulted"],
cmap=plt.cm.Blues,
normalize=normalize)
disp.ax_.set_title(title)
print(title)
print(disp.confusion_matrix)
在创建了一些具有不同值的虚拟对象后,我想为循环中创建的每个虚拟对象生成两个混淆矩阵(标准化和非标准化)。
解决方案
推荐阅读
- haskell - 类型脚手架/交换组合
- c++ - 如何修复终端中的“打开默认参数文件时出错”错误
- docker - docker 的 IP 地址是否“稳定”可在系统的“/etc/hosts”文件中使用?
- angularjs - 有没有办法使用 Angular Element 将 AngularJS 组件包装到 Web 组件中?
- python - 多线程下载器无故变慢
- android - 带有recycleview的左滑动布局
- javascript - 如何在 JavaScript 中使用变量跟踪值,而不在递归调用中重置所述变量,而不使用全局变量?
- python - 如何创建 lambda 的 python 列表理解?
- python - 使用 ElementTree 解析 XML 中的 if-else 查找
- python - 尝试向 Seaborn 散点图添加颜色条