首页 > 解决方案 > 从 sklearn 尝试“LeaveOneGroupOut”时收到 python 异常

问题描述

我是 Scikit-Learn 包的新手,正在尝试使用LeaveOneGroupOut交叉验证来完成简单的分类任务。
我使用了以下代码,我根据scikit-learn.org网站[此链接]上的文档采用了该代码:

from sklearn.model_selection import LeaveOneGroupOut
from sklearn.model_selection import cross_val_score
from sklearn import svm    

X = Selected_Dataset[:,:-1]
y = Selected_Labels
groups = Selected_SubjIDs

clf = svm.SVC(kernel='linear', C=1)    
cv = LeaveOneGroupOut()
cv.get_n_splits(X, y, groups=groups)

cross_val_score(clf, X, y, cv=cv)

但是这段代码会产生以下异常:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-7-27b53a67db71> in <module>
     14 
     15 
---> 16 cross_val_score(clf, X, y, cv=cv)
     17 
     18 

~/miniconda3/lib/python3.6/site-packages/sklearn/model_selection/_validation.py in cross_val_score(estimator, X, y, groups, scoring, cv, n_jobs, verbose, fit_params, pre_dispatch)
    340                                 n_jobs=n_jobs, verbose=verbose,
    341                                 fit_params=fit_params,
--> 342                                 pre_dispatch=pre_dispatch)
    343     return cv_results['test_score']
    344 

~/miniconda3/lib/python3.6/site-packages/sklearn/model_selection/_validation.py in cross_validate(estimator, X, y, groups, scoring, cv, n_jobs, verbose, fit_params, pre_dispatch, return_train_score)
    204             fit_params, return_train_score=return_train_score,
    205             return_times=True)
--> 206         for train, test in cv.split(X, y, groups))
    207 
    208     if return_train_score:

~/miniconda3/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in __call__(self, iterable)
    777             # was dispatched. In particular this covers the edge
    778             # case of Parallel used with an exhausted iterator.
--> 779             while self.dispatch_one_batch(iterator):
    780                 self._iterating = True
    781             else:

~/miniconda3/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in dispatch_one_batch(self, iterator)
    618 
    619         with self._lock:
--> 620             tasks = BatchedCalls(itertools.islice(iterator, batch_size))
    621             if len(tasks) == 0:
    622                 # No more tasks available in the iterator: tell caller to stop.

~/miniconda3/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in __init__(self, iterator_slice)
    125 
    126     def __init__(self, iterator_slice):
--> 127         self.items = list(iterator_slice)
    128         self._size = len(self.items)
    129 

~/miniconda3/lib/python3.6/site-packages/sklearn/model_selection/_validation.py in <genexpr>(.0)
    200                         pre_dispatch=pre_dispatch)
    201     scores = parallel(
--> 202         delayed(_fit_and_score)(
    203             clone(estimator), X, y, scorers, train, test, verbose, None,
    204             fit_params, return_train_score=return_train_score,

~/miniconda3/lib/python3.6/site-packages/sklearn/model_selection/_split.py in split(self, X, y, groups)
     93         X, y, groups = indexable(X, y, groups)
     94         indices = np.arange(_num_samples(X))
---> 95         for test_index in self._iter_test_masks(X, y, groups):
     96             train_index = indices[np.logical_not(test_index)]
     97             test_index = indices[test_index]

~/miniconda3/lib/python3.6/site-packages/sklearn/model_selection/_split.py in _iter_test_masks(self, X, y, groups)
    822     def _iter_test_masks(self, X, y, groups):
    823         if groups is None:
--> 824             raise ValueError("The 'groups' parameter should not be None.")
    825         # We make a copy of groups to avoid side-effects during iteration
    826         groups = check_array(groups, copy=True, ensure_2d=False, dtype=None)

ValueError: The 'groups' parameter should not be None.

我发现这两个相关的 Bug 在2016年和2017 年被报告。

有什么办法吗?

标签: pythonpython-3.xexceptionscikit-learncross-validation

解决方案


你必须使用

cross_val_score(clf, X, y, cv=cv, groups=groups)

你可以删除get_n_splits.

工作示例

from sklearn.model_selection import LeaveOneGroupOut
from sklearn.model_selection import cross_val_score
from sklearn import svm    
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.preprocessing import Normalizer

#load the data
breast_cancer = load_breast_cancer()

X = breast_cancer.data
y = breast_cancer.target
groups = np.random.binomial(1,0.5,size=len(X))

clf = svm.SVC(kernel='linear', C=1)    
cv = LeaveOneGroupOut()


cross_val_score(clf, X, y, cv=cv,groups=groups)

推荐阅读