python - scikit-learn 的 Iterative Imputer 的包装器自定义类,用于与 cross_val_score() 一起使用
问题描述
Scikit-learn 的迭代估算器可以以循环方式估算缺失值。为了评估其与其他传统回归器的性能,可以构建一个简单的管道并从 cross_val_score 获取评分指标。问题是Iterative Imputer没有根据错误的“预测”方法:
AttributeError: 'IterativeImputer' object has no attribute 'predict'
请参阅尝试实现的最小示例:
# import libraries
import pandas as pd
from sklearn.experimental import enable_iterative_imputer
from sklearn.impute import IterativeImputer
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import cross_val_score
from sklearn.pipeline import Pipeline
# define scaler, model and pipeline
scaler = StandardScaler() # use any scaler
imputer = IterativeImputer() # with any estimator, default = BayesianRidge()
pipeline = Pipeline(steps=[('s', scaler), ('i', imputer)])
train, test = df.values, df['A'].values
scores = cross_val_score(pipeline, train, test, cv=10, scoring='r2')
print(scores)
有哪些可能的解决方案?如果需要自定义包装器,应如何编写以包含“预测”方法?
解决方案
cross_val_score
最后需要pipeline
with model
(有predict
)
scaler = StandardScaler()
imputer = IterativeImputer()
model = BayesianRidge() # any model
pipeline = Pipeline(steps=[('s', scaler), ('i', imputer), ('m', model)])
cross_val_score
没有model
任何意义。
我还看到了其他问题 - 与您在中使用的train
值有关。test
cross_val_score
它应该是X
,y
而不是train
,test
但它只是名称,所以它不是那么重要,但重要的是你对变量的赋值。
问题是X
应该没有y
但你使用train = df.values
所以你创建X
y
df_train = pd.DataFrame({
'X': range(20),
'y': range(20),
})
X_train = df_train[ ['X'] ] # it needs inner `[]` to create DataFrame, not Series
y_train = df_train[ 'y' ] # it has to be single column (Series)
scores = cross_val_score(pipeline, X_train, y_train, cv=10, scoring='r2')
(顺便说一句:你不必使用.values
)
与更多列相同
df_train = pd.DataFrame({
'A': range(20),
'B': range(20),
'y': range(20),
})
X_train = df_train[ ['A', 'B'] ]
y_train = df_train[ 'y' ]
最少的工作代码,但有假数据(没用)
# import libraries
import pandas as pd
from sklearn.experimental import enable_iterative_imputer
from sklearn.impute import IterativeImputer
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import cross_val_score
from sklearn.pipeline import Pipeline
from sklearn.linear_model import BayesianRidge
df_train = pd.DataFrame({
'A': range(100), # fake data
'B': range(100), # fake data
'y': range(100), # fake data
})
df_test = pd.DataFrame({
'A': range(20), # fake data
'B': range(20), # fake data
'y': range(20), # fake data
})
# define scaler, model and pipeline
scaler = StandardScaler()
imputer = IterativeImputer()
model = BayesianRidge()
pipeline = Pipeline(steps=[('s', scaler), ('i', imputer), ('m', model)])
X_train = df_train[ ['A', 'B'] ] # it needs inner `[]` to create DataFrame, not Series
y_train = df_train[ 'y' ] # it has to be single column (Series)
scores = cross_val_score(pipeline, X_train, y_train, cv=10, scoring='r2')
print(scores)
X_test = df_test[['A', 'B']]
y_test = df_test['y']
scores = cross_val_score(pipeline, X_test, y_test, cv=10, scoring='r2')
print(scores)
推荐阅读
- keycloak - Keycloak 用户角色映射和权限
- javascript - Nuxt js在api文件夹更改时重新启动服务器
- twilio - Twilio REST BindingResource 错误“无法将服务实例与帐户匹配”
- java - 如何正确编写 evaluatePostfix 函数?
- batch-file - 用于自动映射网络驱动器但无法检测网络驱动器的批处理脚本
- php - 如何在 PHP 中从下拉列表中检索数据到文本框?
- python - bigQuery Google Cloud 如何与其他用户共享数据集?使用蟒蛇
- ios - 键盘显示通知 iOS 的扩展
- android - PlaceAutocompleteFragment - null 不能转换为非 null 类型(Kotlin)
- ios - UITableView 在滚动或重新加载 tableviewcell 之前不显示远程图像