python - ValueError: bad input shape () when using 'roc_auc' with GridSearchCV
问题描述
我在使用'roc_auc'
scorer 时遇到了一个奇怪的错误GridSearchCV
。当我改用时,错误不会发生'accuracy'
。查看看起来正在传递的堆栈跟踪y_score
,roc_curve
导致None
此错误来自column_or_1d
. column_or_1d
我通过直接调用None
作为输入对此进行了测试,并轻松重现了该错误。
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import StandardScaler, MaxAbsScaler, MinMaxScaler
from sklearn.decomposition import PCA
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import GradientBoostingClassifier
from pipelinehelper.pipelinehelper import PipelineHelper
pipe = Pipeline([
('scaler', PipelineHelper([
('std', StandardScaler()),
('abs', MaxAbsScaler()),
('minmax', MinMaxScaler()),
('pca', PCA(svd_solver='full', whiten=True)),
])),
('classifier', PipelineHelper([
('knn', KNeighborsClassifier(weights='distance')),
('gbc', GradientBoostingClassifier())
])),
])
params = {
'scaler__selected_model': pipe.named_steps['scaler'].generate({
'std__with_mean': [True, False],
'std__with_std': [True, False],
'pca__n_components': [0.5, 0.75, 0.9, 0.99],
}),
'classifier__selected_model': pipe.named_steps['classifier'].generate({
'knn__n_neighbors': [1, 3, 5, 7, 10],#, 30, 50, 70, 90, 110, 130, 150, 170, 190],
'gbc__learning_rate': [0.1, 0.5, 1.0],
'gbc__subsample': [0.5, 1.0],
})
}
grid = GridSearchCV(pipe, params, scoring='roc_auc', n_jobs=1, verbose=1, cv=5)
grid.fit(X, y)
一些调试信息
>>> X.shape
... (13885, 23)
>>> y.shape
... (13885,)
>>> X
... array([[ 0. , 0. , 0. , ..., 7.14285714,
0.9 , 35.4644354 ],
[ 0. , 0. , 0. , ..., 2.11442806,
1.2 , 54.99027913],
[ 1. , 0. , 0. , ..., 2.64959194,
0.7 , 70.07380534],
...,
[ 1. , 0. , 0. , ..., 4.375 ,
0.5 , 91.85932945],
[ 1. , 0. , 0. , ..., 3.75 ,
0.9 , 68.62436682],
[ 0. , 0. , 1. , ..., 3.01587302,
4.1 , 57.25781074]])
>>> y
... array([0, 0, 0, ..., 0, 0, 1])
>>> y.mean()
... 0.11278357940223263
>>> sklearn.__version__
'0.20.3'
我收到错误:
python3.7/site-packages/sklearn/metrics/ranking.py in roc_curve(y_true, y_score, pos_label, sample_weight, drop_intermediate)
616 """
617 fps, tps, thresholds = _binary_clf_curve(
--> 618 y_true, y_score, pos_label=pos_label, sample_weight=sample_weight)
619
620 # Attempt to drop thresholds corresponding to points in between and
python3.7/site-packages/sklearn/metrics/ranking.py in _binary_clf_curve(y_true, y_score, pos_label, sample_weight)
399 check_consistent_length(y_true, y_score, sample_weight)
400 y_true = column_or_1d(y_true)
--> 401 y_score = column_or_1d(y_score)
402 assert_all_finite(y_true)
403 assert_all_finite(y_score)
python3.7/site-packages/sklearn/utils/validation.py in column_or_1d(y, warn)
795 return np.ravel(y)
796
--> 797 raise ValueError("bad input shape {0}".format(shape))
798
799
ValueError: bad input shape ()
我进一步测试了使用以下生成的数据,我得到了完全相同的错误:
from sklearn.datasets import make_classification
X_test, y_test = make_classification(100, 23)
我切换到使用不使用PipelineHelper
和错误的管道,所以我假设这是严格的并且有问题PipelineHelper
?在我继续提交该项目的错误报告之前,我想知道是否有人对如何解决这个问题有任何想法?
pipe = Pipeline([
('scaler', StandardScaler()),
('classifier', GradientBoostingClassifier()),
])
params = {
'scaler__with_mean': [True, False],
'scaler__with_std': [True, False],
'classifier__learning_rate': [0.1, 0.5, 1.0],
'classifier__subsample': [0.5, 1.0],
}
PS 我正在使用来自https://github.com/bmurauer/pipelinehelper的 PipelineHelper
解决方案
我继续向项目提交了错误报告,并切换到了此处找到的替代解决方案。正如twitter 上的 sklearn 维护者所指出的,我也可以轻松地使用内置的 sklearn 工具并编写自己的代码来遍历所有选项。无论如何,我认为我推荐的解决方案是不要使用,PipelineHelper
因为它似乎功能不完整。
推荐阅读
- javascript - 无法使用 React 客户端发出 API 请求
- javascript - 如何使用连续数据
- c++ - std::vector 在 last 之后使用第一个迭代器擦除
- firebase - 如何在 Firebase 实时数据库中解码这个 json?
- python - AttributeError:“工作表”对象没有属性“set_column”
- math - 数学问题在子视图中划分视图
- javascript - 如何找到属性值的最小值和最大值
- php - 错误:类 'MailchimpMarketing\ApiClient'
- azure-active-directory - Blazor WASM - AzureAD 身份验证 - HttpContext.User.Claims 是空的?
- java - 如何让用户能够通过使用扫描仪功能无限输入键盘来输入内容?