首页 > 解决方案 > 如何对多个 ML 模型执行网格搜索

问题描述

通常我们使用 GridSearchCV 对一个特定模型的超参数执行网格搜索,例如:

model_ada = AdaBoostClassifier()
params_ada = {'n_estimators':[10,20,30,50,100,500,1000], 'learning_rate':[0.5,1,2,5,10]}
grid_ada = GridSearchCV(estimator = model_ada, param_grid = params_ada, scoring = 'accuracy', cv = 5, verbose = 1, n_jobs = -1)
grid_ada.fit(X_train, y_train)

是否有任何技术或功能可以让我们自己对 ML 模型执行网格搜索?例如,我想做如下所示:

models = {'model_gbm':GradientBoostingClassifier(), 'model_rf':RandomForestClassifier(), 'model_dt':DecisionTreeClassifier(), 'model_svm':SVC(), 'model_ada':AdaBoostClassifier()}
params_gbm = {'learning_rate':[0.1,0.2,0.3,0.4], 'n_estimators':[50,100,500,1000,2000]}
params_rf = {'n_estimators':[50,100,500,1000,2000]}
params_dt = {'splitter':['best','random'], 'max_depth':[1, 5, 10, 50, 100]}
params_svm = {'C':[1,2,5,10,50,100,500], 'kernel':['rbf','poly','sigmoid','linear']}
params_ada = {'n_estimators':[10,20,30,50,100,500,1000], 'learning_rate':[0.5,1,2,5,10]}
params = {'params_gbm':params_gbm, 'params_rf':params_rf, 'params_dt':params_dt, 'params_svm':params_svm, 'params_ada':params_ada}
grid_ml = "that function"(models = models, params = params)
grid_ml.fit(X_train, y_train)

其中“那个函数”是我需要用来执行这种类型的操作的函数。

标签: pythonmachine-learningscikit-learngridsearchcv

解决方案


即使我遇到了类似的问题,但找不到可能实现此目的的预定义包/方法。因此我编写了自己的函数来实现这一点:

    def Algo_search(models , params):

       max_score = 0
       max_model = None
       max_model_params = None

       for i,j in zip(models.keys() , models.values() ):

            gs = GridSearchCV(estimator=j,param_grid=params[i])
            a = gs.fit(X_train,y_train)
            score = gs.score(X_test,y_test)

            if score > max_score:
                max_score = score
                max_model = gs.best_estimator_
                max_model_params = gs.best_params_

       return max_score, max_model, max_model_params

      #Data points
    models = {'model_gbm':GradientBoostingClassifier(), 'model_rf':RandomForestClassifier(), 
      'model_dt':DecisionTreeClassifier(), 'model_svm':SVC(), 'model_ada':AdaBoostClassifier()}
   params_gbm = {'learning_rate':[0.1,0.2,0.3,0.4], 'n_estimators':[50,100,500,1000,2000]}
   params_rf = {'n_estimators':[50,100,500,1000,2000]}
   params_dt = {'splitter':['best','random'], 'max_depth':[1, 5, 10, 50, 100]}
   params_svm = {'C':[1,2,5,10,50,100,500], 'kernel':['rbf','poly','sigmoid','linear']}
   params_ada = {'n_estimators':[10,20,30,50,100,500,1000], 'learning_rate':[0.5,1,2,5,10]}
   params = {'model_gbm':params_gbm, 'model_rf':params_rf, 'model_dt':params_dt, 'model_svm':params_svm, 'model_ada':params_ada}
   grid_ml = Algo_search(models = models, params = params)

推荐阅读