python - 使用 Sklearn 进行插补将“数字”列更改为“对象”(除了填充缺失数据)
问题描述
在估算之前,我在“X_train”中有数值列: numeric_cols = [col for col in X_train.columns if X_train[col].dtype in ['int64','float64']] numeric_cols
插补后,新数据帧“imputed_X_train_missing”中不再有数值列,所有数值列现在都是“对象”。这是应用 XGBRegressor 时的一个潜在问题。
这是我的代码:
X_valid_missing = X_valid.copy()
my_imputer = SimpleImputer(missing_values=np.nan, strategy='most_frequent')
my_imputer.fit(X_train_missing)
imputed_X_train_missing = pd.DataFrame(my_imputer.transform(X_train_missing))
imputed_X_valid_missing = pd.DataFrame(my_imputer.transform(X_valid_missing))
imputed_X_train_missing.columns = X_train_missing.columns
imputed_X_valid_missing.columns = X_valid_missing.columns ```
解决方案
当其中一列是“对象”时,问题是输入器。插补后所有列结果为“对象”:
import pandas as pd
from sklearn.impute import SimpleImputer
X_train = [['dddd', 2, 3], ['dddd', np.nan, 6], ['dddd', 5, 9]]
X_test = [[np.nan, 2, 3], ['dddd', np.nan, 6], ['dddd', np.nan, 9]]
col_names = ['c1', 'c2', 'c3']
df_x_train = pd.DataFrame(X_train, columns=col_names)
df_x_test = pd.DataFrame(X_test, columns=col_names)
print(df_x_train.info())
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 c1 3 non-null object
1 c2 2 non-null float64
2 c3 3 non-null int64
imp = SimpleImputer(missing_values=np.nan, strategy='most_frequent')
imp.fit(df_x_train)
imputed_x_train = pd.DataFrame(imp.transform(df_x_train))
imputed_x_train.dtypes`
Now all the columns result object:
0 object
1 object
2 object
dtype: object```
推荐阅读
- node.js - 无法编译打字稿项目
- mysql - 在 Ubuntu 20.04 上配置 MariaDB
- reactjs - Reactjs - 使用打字稿时createStore无法识别reducer
- flutter - 如何在 BlocBuilder Flutter 中延迟返回屏幕
- objective-c - 如何修复 UIScrollView 在 .xib 问题中不滚动?
- android-studio - 在 Android Studio (Windows 10) 中安装 HAXM 时遇到问题
- java - 在 Java 字符串中查找给定字符之前的最后一个字符
- r - ggplot中的内部刻度
- azure - MS 图形 API 无法授予应用程序角色
- python - 关于python evdev rfid卡扫描仪数据的问题