首页 > 解决方案 > GridSearchCV 和 RandomizedSearchCV (sklearn):TypeError:__call__() 缺少 1 个必需的位置参数:'y_true'

问题描述

我正在尝试使用GridSearchCVRandomizedSearchCV找到两种无监督学习算法(用于新颖性检测)的最佳参数OneClassSVMLocalOutlierFactorsklearn

以下是我编写的函数(对此示例进行了修改):

def gridsearch(clf, param_dist_rand, param_grid_exhaustive, X):


    def report(results, n_top=3):
       for i in range(1, n_top + 1):
           candidates = np.flatnonzero(results['rank_test_score'] == i)
           for candidate in candidates:
               print("Model with rank: {0}".format(i))
               print("Mean validation score: {0:.3f} (std: {1:.3f})".format(
                results['mean_test_score'][candidate],
                results['std_test_score'][candidate]))
               print("Parameters: {0}".format(results['params'][candidate]))
               print("")

     n_iter_search = 20
     random_search = RandomizedSearchCV(clf, 
     param_distributions=param_dist_rand, n_iter=n_iter_search, cv=5, 
     error_score=np.NaN, scoring='accuracy')

      start = time()
      random_search.fit(X)
      print("RandomizedSearchCV took %.2f seconds for %d candidates"
      " parameter settings." % ((time() - start), n_iter_search))
      report(random_search.cv_results_)


      grid_search = GridSearchCV(clf, param_grid=param_grid_exhaustive, 
      cv=5, error_score=np.NaN, scoring='accuracy')
      start = time()
      grid_search.fit(X)

      print("GridSearchCV took %.2f seconds for %d candidate parameter 
      settings."
      % (time() - start, len(grid_search.cv_results_['params'])))
      report(grid_search.cv_results_)

为了测试上面的功能,我有以下代码:

X, W = train_test_split(all_data, test_size=0.2, random_state=42)
clf_lof = LocalOutlierFactor(novelty=True, contamination='auto')
lof_param_dist_rand = {'n_neighbors': np.arange(6, 101, 1), 'leaf_size': 
                      np.arange(30, 101, 10)}
lof_param_grid_exhaustive = {'n_neighbors': np.arange(6, 101, 1), 
                           'leaf_size': np.arange(30, 101, 10)}
gridsearch(clf=clf_lof, param_dist_rand=lof_param_dist_rand, 
param_grid_exhaustive=lof_param_grid_exhaustive, X=X)


clf_svm = svm.OneClassSVM()
svm_param_dist_rand = {'nu': np.arange(0, 1.1, 0.01), 'kernel': ['rbf', 
                     'linear','poly','sigmoid'], 'degree': np.arange(0, 7, 
                      1), 'gamma': scipy.stats.expon(scale=.1),}
svm_param_grid_exhaustive = {'nu': np.arange(0, 1.1, 0.01), 'kernel': 
                            ['rbf', 'linear','poly','sigmoid'], 'degree': 
                            np.arange(0, 7, 1), 'gamma': 0.25}
gridsearch(clf=clf_svm, param_dist_rand=svm_param_dist_rand, 
param_grid_exhaustive=svm_param_grid_exhaustive, X=X)

最初,我没有scoring为这两种方法设置参数GridSearch,我得到了这个错误:

TypeError: If no scoring is specified, the estimator passed should have a 'score' method.

然后我补充说scoring='accuracy',因为我想使用测试精度来判断不同模型配置的性能。现在我收到此错误:

TypeError: __call__() missing 1 required positional argument: 'y_true'

我没有标签,因为我有一个类的数据,而计数器类没有数据,所以我不知道如何解决这个错误。此外,我查看了这个问题中建议的内容,但它对我没有帮助。任何帮助将不胜感激。

编辑: 根据@FChm 提供示例数据的建议,请在此处.csv找到示例数据文件。文件的简短描述:由我输入模型的四列特征(PCA 生成)组成。

标签: python-3.xscikit-learngrid-search

解决方案


推荐阅读