python - 对如何将标签编码值与原始值一起使用感到困惑
问题描述
嗨,我正在尝试处理数据集同时包含数字和字母值的 ML 项目。我使用LabelEncoder()
sklearn 成功将字母值转换为数字,但我无法在“X”“y”变量中添加所有必需的值。这是我的代码
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn import preprocessing
from sklearn.metrics import accuracy_score
data = pd.read_csv('data-set.csv')
num_val = preprocessing.LabelEncoder()
gender = num_val.fit_transform(list(data['gender']))
ever_married = num_val.fit_transform(list(data['ever_married']))
work_type = num_val.fit_transform(list(data['work_type']))
Residence_type = num_val.fit_transform(list(data['Residence_type']))
smoking_status = num_val.fit_transform(list(data['smoking_status']))
predict = "stroke"
X = list(zip(gender,ever_married,work_type,Residence_type,smoking_status))
y = data['stroke']
X_train,X_test,y_train,y_test = train_test_split(X, y, test_size=0.1)
model = SVC()
model.fit(X_train, y_train)
pred = model.predict(X_test)
acc = accuracy_score(y_test, pred)
print(acc)
我使用的数据集在这里
如何将“X”变量中的所有值和数据集中的其他值加在一起(更改的值和未更改的数值。请帮助
解决方案
将 Pandasapply
与具有相同代码的函数(transform
在下面的示例中)一起使用,但使用columns
要在原始数据帧 ( data
) 上转换的列表。接下来,从数据框中删除目标列(stroke
在此特定数据集中)以创建X
变量。您还必须bmi
使用与您的分析相关的内容填充 NaN 值,否则该fit
函数将引发ValueError
.
...
data = pd.read_csv('healthcare-dataset-stroke-data.csv')
print(data.head())
def transform(series):
num_val = preprocessing.LabelEncoder()
np_array = num_val.fit_transform(list(series))
return pd.Series(np_array)
t_list = ["gender","ever_married","work_type","Residence_type","smoking_status"]
data[t_list] = data[t_list].apply(transform)
print(data.head())
predict = "stroke"
X = data.drop(columns=['stroke'])
# fill "bmi" NaN values with something relevant to your analysis
X = X.fillna(X.median())
y = data['stroke']
X_train,X_test,y_train,y_test = train_test_split(X, y, test_size=0.1)
...
原始数据框
id gender age ... work_type Residence_type avg_glucose_level bmi smoking_status stroke
0 9046 Male 67.0 ... Private Urban 228.69 36.6 formerly smoked 1
1 51676 Female 61.0 ... Self-employed Rural 202.21 NaN never smoked 1
2 31112 Male 80.0 ... Private Rural 105.92 32.5 never smoked 1
3 60182 Female 49.0 ... Private Urban 171.23 34.4 smokes 1
4 1665 Female 79.0 ... Self-employed Rural 174.12 24.0 never smoked 1
转换后的数据框
id gender age ... work_type Residence_type avg_glucose_level bmi smoking_status stroke
0 9046 1 67.0 ... 2 1 228.69 36.6 1 1
1 51676 0 61.0 ... 3 0 202.21 NaN 2 1
2 31112 1 80.0 ... 2 0 105.92 32.5 2 1
3 60182 0 49.0 ... 2 1 171.23 34.4 3 1
4 1665 0 79.0 ... 3 0 174.12 24.0 2 1
推荐阅读
- css - 更改 element-ui 标题行的背景颜色
- html - 如何将一个html文件中的导航栏包含到另一个html文件中
- javascript - JQuery onclick改变z-index
- angular - 装饰器Angular 9不支持函数表达式
- laravel - 后端不返回时如何避免控制台错误403
- html - 如何在容器 div 中创建分段
- tensorflow - 如何在 TensorFlow 中正确定义包含形状操作的模型/层?
- java - 2个具有相同ID的巨大列表需要过滤并仅获取不同的项目
- angular - Android 和 iOS 设备上的 Ionic 5 中无法关闭键盘
- database - 如何查找数据库事务差异