python-3.x - 使用 Imblearn 管道和 GridSearchCV 进行交叉验证
问题描述
我正在尝试使用Pipeline
该类imblearn
来GridSearchCV
获取对不平衡数据集进行分类的最佳参数。根据这里提到的答案,我想省略验证集的重新采样,而只重新采样训练集,这imblearn
似乎Pipeline
正在做。但是,在实施公认的解决方案时出现错误。请让我知道我做错了什么。下面是我的实现:
def imb_pipeline(clf, X, y, params):
model = Pipeline([
('sampling', SMOTE()),
('classification', clf)
])
score={'AUC':'roc_auc',
'RECALL':'recall',
'PRECISION':'precision',
'F1':'f1'}
gcv = GridSearchCV(estimator=model, param_grid=params, cv=5, scoring=score, n_jobs=12, refit='F1',
return_train_score=True)
gcv.fit(X, y)
return gcv
for param, classifier in zip(params, classifiers):
print("Working on {}...".format(classifier[0]))
clf = imb_pipeline(classifier[1], X_scaled, y, param)
print("Best parameter for {} is {}".format(classifier[0], clf.best_params_))
print("Best `F1` for {} is {}".format(classifier[0], clf.best_score_))
print('-'*50)
print('\n')
参数:
[{'penalty': ('l1', 'l2'), 'C': (0.01, 0.1, 1.0, 10)},
{'n_neighbors': (10, 15, 25)},
{'n_estimators': (80, 100, 150, 200), 'min_samples_split': (5, 7, 10, 20)}]
分类器:
[('Logistic Regression',
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, l1_ratio=None, max_iter=100,
multi_class='warn', n_jobs=None, penalty='l2',
random_state=None, solver='warn', tol=0.0001, verbose=0,
warm_start=False)),
('KNearestNeighbors',
KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
metric_params=None, n_jobs=None, n_neighbors=5, p=2,
weights='uniform')),
('Gradient Boosting Classifier',
GradientBoostingClassifier(criterion='friedman_mse', init=None,
learning_rate=0.1, loss='deviance', max_depth=3,
max_features=None, max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=100,
n_iter_no_change=None, presort='auto',
random_state=None, subsample=1.0, tol=0.0001,
validation_fraction=0.1, verbose=0,
warm_start=False))]
错误:
ValueError: Invalid parameter C for estimator Pipeline(memory=None,
steps=[('sampling',
SMOTE(k_neighbors=5, kind='deprecated',
m_neighbors='deprecated', n_jobs=1,
out_step='deprecated', random_state=None, ratio=None,
sampling_strategy='auto', svm_estimator='deprecated')),
('classification',
LogisticRegression(C=1.0, class_weight=None, dual=False,
fit_intercept=True, intercept_scaling=1,
l1_ratio=None, max_iter=100,
multi_class='warn', n_jobs=None,
penalty='l2', random_state=None,
solver='warn', tol=0.0001, verbose=0,
warm_start=False))],
verbose=False). Check the list of available parameters with `estimator.get_params().keys()`. """
解决方案
请检查此示例如何将参数与管道一起使用:- https://scikit-learn.org/stable/auto_examples/compose/plot_compare_reduction.html#sphx-glr-auto-examples-compose-plot-compare-reduction-py
无论何时使用管道,您都需要以某种方式发送参数,以便管道可以了解哪个参数用于列表中的哪个步骤。为此,它使用您在管道初始化期间提供的名称。
在您的代码中,例如:
model = Pipeline([
('sampling', SMOTE()),
('classification', clf)
])
要将参数 p1 传递给 SMOTE,您可以将sampling__p1
其用作参数,而不是p1
.
您用作您"classification"
的名称,clf
因此将其附加到应该转到clf
.
尝试:
[{'classification__penalty': ('l1', 'l2'), 'classification__C': (0.01, 0.1, 1.0, 10)},
{'classification__n_neighbors': (10, 15, 25)},
{'classification__n_estimators': (80, 100, 150, 200), 'min_samples_split': (5, 7, 10, 20)}]
确保名称和参数之间有两个下划线。
推荐阅读
- javascript - Javascript 承诺从 catch 块返回字符串
- c# - 如何从dll调用私有方法?
- javascript - 如何从列表框中更新 html 表格
- javascript - 如何在 HTML , CSS 中将许多图像卡保持在同一行
- swift - 你如何在 Swift 中展示来自 Xib 的 VC?
- angular - 如何修复错误 TS1251:针对“ES3”或“ES5”时,严格模式下的块内不允许函数声明。?
- c#-4.0 - 如何在 wep api 响应中仅返回子模型属性
- ios - 在应用程序从后台到前台恢复时,应用程序从第一个导航屏幕重新启动
- python - 创建视图时出错并显示一些错误,这些错误类似于显式标签并且不在安装的应用程序中
- variables - 我如何在容器中调用 _email ?在 Flutter Android 应用程序中