python - GridSearchCV 分数 = neg_log_loss
问题描述
我正在尝试使用 gridsearchCV 搜索指定的参数评分,并带有 neg log loss:
grid = GridSearchCV(spec_pipeline, param_grid = spec_params, scoring = 'neg_log_loss', cv = logo, verbose = 10)
grid.fit(X, y_true, groups = names)
ValueError: y_true contains only one label (1.0). Please provide the true labels explicitly through the labels argument.
相同的代码,但准确评分工作正常。我发现对于日志丢失,我们需要指定标签,这在使用 sklearn.metrics 时效果很好:
y_labels = np.unique(y_true)
y_labels
array([0., 1., 2.])
metrics.log_loss(y_true, y_pred, labels = y_labels )
所以我尝试了:
grid.fit(order_inner_x, y_inner, groups = names_inner, labels = y_labels)
ValueError: not enough values to unpack (expected 2, got 1)
我已经尝试了上述的很多变体,并且还创建了我自己的得分手:
LogLoss = make_scorer(log_loss, greater_is_better=False, needs_proba=True)
但我尝试的一切都归结为上述两个错误之一。显然我错过了一些东西,所以非常感谢任何帮助。
更新:
在上面犯了一个小错误——这是一个三类问题,而不是我最初暗示的二元问题。
我试过 Ben 的建议(谢谢!):
LogLoss = metrics.make_scorer(metrics.log_loss, greater_is_better=False, needs_proba=True, labels=[0, 1, 2])
grid = GridSearchCV(spec_pipeline, param_grid = spec_params, scoring = LogLoss, cv = logo, verbose = 10)
grid.fit(order_inner_x, y_inner, groups=names_inner)
我得到了一个不同的错误,所以希望更近一步,这是完整的回溯:
ValueError Traceback (most recent call last)
<ipython-input-164-43d9f1633dc9> in <module>
2
3 grid = GridSearchCV(spec_pipeline, param_grid = spec_params, scoring = LogLoss, cv = logo, verbose = 10)
----> 4 grid.fit(order_inner_x, y_inner, groups=names_inner)
~/anaconda3/lib/python3.7/site-packages/sklearn/model_selection/_search.py in fit(self, X, y, groups, **fit_params)
720 return results_container[0]
721
--> 722 self._run_search(evaluate_candidates)
723
724 results = results_container[0]
~/anaconda3/lib/python3.7/site-packages/sklearn/model_selection/_search.py in _run_search(self, evaluate_candidates)
1189 def _run_search(self, evaluate_candidates):
1190 """Search all candidates in param_grid"""
-> 1191 evaluate_candidates(ParameterGrid(self.param_grid))
1192
1193
~/anaconda3/lib/python3.7/site-packages/sklearn/model_selection/_search.py in evaluate_candidates(candidate_params)
709 for parameters, (train, test)
710 in product(candidate_params,
--> 711 cv.split(X, y, groups)))
712
713 all_candidate_params.extend(candidate_params)
~/anaconda3/lib/python3.7/site-packages/sklearn/externals/joblib/parallel.py in __call__(self, iterable)
918 self._iterating = self._original_iterator is not None
919
--> 920 while self.dispatch_one_batch(iterator):
921 pass
922
~/anaconda3/lib/python3.7/site-packages/sklearn/externals/joblib/parallel.py in dispatch_one_batch(self, iterator)
757 return False
758 else:
--> 759 self._dispatch(tasks)
760 return True
761
~/anaconda3/lib/python3.7/site-packages/sklearn/externals/joblib/parallel.py in _dispatch(self, batch)
714 with self._lock:
715 job_idx = len(self._jobs)
--> 716 job = self._backend.apply_async(batch, callback=cb)
717 # A job can complete so quickly than its callback is
718 # called before we get here, causing self._jobs to
~/anaconda3/lib/python3.7/site-packages/sklearn/externals/joblib/_parallel_backends.py in apply_async(self, func, callback)
180 def apply_async(self, func, callback=None):
181 """Schedule a func to be run"""
--> 182 result = ImmediateResult(func)
183 if callback:
184 callback(result)
~/anaconda3/lib/python3.7/site-packages/sklearn/externals/joblib/_parallel_backends.py in __init__(self, batch)
547 # Don't delay the application, to avoid keeping the input
548 # arguments in memory
--> 549 self.results = batch()
550
551 def get(self):
~/anaconda3/lib/python3.7/site-packages/sklearn/externals/joblib/parallel.py in __call__(self)
223 with parallel_backend(self._backend, n_jobs=self._n_jobs):
224 return [func(*args, **kwargs)
--> 225 for func, args, kwargs in self.items]
226
227 def __len__(self):
~/anaconda3/lib/python3.7/site-packages/sklearn/externals/joblib/parallel.py in <listcomp>(.0)
223 with parallel_backend(self._backend, n_jobs=self._n_jobs):
224 return [func(*args, **kwargs)
--> 225 for func, args, kwargs in self.items]
226
227 def __len__(self):
~/anaconda3/lib/python3.7/site-packages/sklearn/model_selection/_validation.py in _fit_and_score(estimator, X, y, scorer, train, test, verbose, parameters, fit_params, return_train_score, return_parameters, return_n_test_samples, return_times, return_estimator, error_score)
566 fit_time = time.time() - start_time
567 # _score will return dict if is_multimetric is True
--> 568 test_scores = _score(estimator, X_test, y_test, scorer, is_multimetric)
569 score_time = time.time() - start_time - fit_time
570 if return_train_score:
~/anaconda3/lib/python3.7/site-packages/sklearn/model_selection/_validation.py in _score(estimator, X_test, y_test, scorer, is_multimetric)
603 """
604 if is_multimetric:
--> 605 return _multimetric_score(estimator, X_test, y_test, scorer)
606 else:
607 if y_test is None:
~/anaconda3/lib/python3.7/site-packages/sklearn/model_selection/_validation.py in _multimetric_score(estimator, X_test, y_test, scorers)
633 score = scorer(estimator, X_test)
634 else:
--> 635 score = scorer(estimator, X_test, y_test)
636
637 if hasattr(score, 'item'):
~/anaconda3/lib/python3.7/site-packages/sklearn/metrics/scorer.py in __call__(self, clf, X, y, sample_weight)
133 ' but need classifier with two'
134 ' classes for {} scoring'.format(
--> 135 y_pred.shape, self._score_func.__name__))
136 if sample_weight is not None:
137 return self._sign * self._score_func(y, y_pred,
ValueError: got predict_proba of shape (200, 3), but need classifier with two classes for log_loss scoring
解决方案
您已经完成了大部分工作:您需要提供labels
指标。在这次尝试中:
grid.fit(order_inner_x, y_inner, groups = names_inner, labels = y_labels)
您传递标签,但传递给网格搜索的fit
方法而不是评分参数本身。
make_scorer
允许将其他关键字参数传递给度量函数,所以这应该有效:
LogLoss = make_scorer(log_loss, greater_is_better=False, needs_proba=True, labels=[0, 1])
grid = GridSearchCV(spec_pipeline, param_grid = spec_params, scoring = LogLoss, cv = logo, verbose = 10)
grid.fit(X, y_true, groups = names)
推荐阅读
- c++ - problem compiling with different gcc versions
- c - How can I compare user input to string stored into file?
- javascript - 如何每 10 秒在数组中显示一个随机字符串
- django - 如何在多对多关系中访问另一个字段?
- python - 有没有更好的方法来检查按顺序更改的值?
- ruby-on-rails - 如何让我的 Rails 应用程序可以被搜索引擎发现
- microsoft-graph-api - 如何向刚刚创建的计划者任务发送评论。对话线程对他们来说是空的
- flutter - 显示包的所有图标
- sql - 如何在 Access VBA 中运行双循环
- elasticsearch - Elasticsearch 查询从不一致的逗号分隔值数组中删除删除值