首页 > 解决方案 > 我如何获取 targetencoder 的参数名称?网格搜索

问题描述

我有以下情况:

preprocess = make_column_transformer(
    (SimpleImputer(strategy='constant',fill_value = 0),numeric_cols),
    (ce.TargetEncoder(),['country'])
    )

pipeline = make_pipeline(preprocess,XGBClassifier())

pipeline[0].get_params().keys()

dict_keys(['n_jobs', 'remainder', 'sparse_threshold', 'transformer_weights', 'transformers', 'verbose', 'simpleimputer', 'targetencoder', 'simpleimputer__add_indicator', 'simpleimputer__copy', 'simpleimputer__fill_value', 'simpleimputer__missing_values', 'simpleimputer__strategy', 'simpleimputer__verbose', 'targetencoder__cols', 'targetencoder__drop_invariant', 'targetencoder__handle_missing', 'targetencoder__handle_unknown', 'targetencoder__min_samples_leaf', 'targetencoder__return_df', 'targetencoder__smoothing', 'targetencoder__verbose'])

然后我希望对平滑因子进行网格搜索:

所以:

param_grid =    { 
                  'xgbclassifier__learning_rate': [0.01,0.005,0.001],
    'targetencoder__smoothing': [1, 10, 30, 50]
                 
                  }

pipeline = make_pipeline(preprocess,XGBClassifier())

# Initialize Grid Search Modelg
clf = GridSearchCV(pipeline,param_grid = param_grid,scoring = 'neg_mean_squared_error',
                                 verbose= 1,iid= True,
                                     refit = True,cv  = 3)
clf.fit(X_train,y_train)

但是我收到此错误:

ValueError:估计器管道的参数transformer_targetencoder无效(steps = [('columntransformer',ColumnTransformer(transformers ...

如何访问平滑参数?

标签: pythonscikit-learnpipelinegrid-searchencoder

解决方案


使用您的示例,它将是columntransformer__targetencoder__smoothing. 为了重现管道,首先我使用示例数据集并定义列:

from sklearn.compose import make_column_transformer
from sklearn.pipeline import make_pipeline
from sklearn.impute import SimpleImputer
import category_encoders as ce
from xgboost import XGBClassifier
from sklearn.model_selection import GridSearchCV

X_train = pd.DataFrame({'x1':np.random.normal(0,1,50),
                   'x2':np.random.normal(0,1,50),
                  'country':np.random.choice(['A','B','C'],50)})
y_train = np.random.binomial(1,0.5,50)

numeric_cols = ['x1','x2']

preprocess = make_column_transformer(
    (SimpleImputer(strategy='constant',fill_value = 0),numeric_cols),
    (ce.TargetEncoder(),['country'])
    )

pipeline = make_pipeline(preprocess,XGBClassifier())

您应该查看更高级别的键:

pipeline.get_params().keys()

然后设置网格,确保平滑是浮点数(参见这个问题):

param_grid = { 'columntransformer__targetencoder__smoothing': [1.0, 10.0],
'xgbclassifier__learning_rate': [0.01,0.001]}

pipeline = make_pipeline(preprocess,XGBClassifier())

clf = GridSearchCV(pipeline,param_grid = param_grid,scoring = 'neg_mean_squared_error', 
verbose= 1,refit = True,cv  = 3)
clf.fit(X_train,y_train)

它应该工作


推荐阅读