python - “使用 Python 和 H2O 进行机器学习”手册中的 RandomizedSearchCV 示例不起作用
问题描述
我有点困惑,因为我没有从“使用 Python 和 H2O 进行机器学习”手动工作(第 36 页)中得到最后一个示例。
这是代码:
import h2o
h2o.init()
from h2o.estimators.gbm import H2OGradientBoostingEstimator
from h2o.transforms.preprocessing import H2OScaler
from h2o.cross_validation import H2OKFold
from h2o.model.regression import h2o_r2_score
from sklearn.pipeline import Pipeline
from sklearn.model_selection import RandomizedSearchCV
from sklearn.metrics.scorer import make_scorer
h2o.__PROGRESS_BAR__=False
h2o.no_progress()
iris_data_path = "http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris.csv"# load demonstration data1819In [5]:
iris_df = h2o.import_file(path=iris_data_path)
params = {"standardize__center": [True, False],
"standardize__scale": [True, False],
"gbm__ntrees": [10,20],
"gbm__max_depth": [1,2,3],
"gbm__learn_rate": [0.1,0.2]}
custom_cv = H2OKFold(iris_df, n_folds=5, seed=42)
pipeline = Pipeline([("standardize", H2OScaler()),
("gbm", H2OGradientBoostingEstimator(distribution="gaussian"))])
random_search = RandomizedSearchCV(pipeline, params, n_iter=5, scoring=make_scorer(h2o_r2_score),
cv=custom_cv, random_state=42, n_jobs=1)
random_search.fit(iris_df[1:], iris_df[0])
它返回错误ValueError: No valid specification of the columns。只允许使用所有整数或所有字符串或布尔掩码的标量、列表或切片。
完整的终端消息:
Traceback (most recent call last):
File "untitled-Copy1.py", line 34, in <module>
random_search.fit(iris_df[1:], iris_df[0])
File "/department/jupyter-dev/anaconda3/envs/python36/lib/python3.6/site-packages/sklearn/model_selection/_search.py", line 710, in fit
self._run_search(evaluate_candidates)
File "/department/jupyter-dev/anaconda3/envs/python36/lib/python3.6/site-packages/sklearn/model_selection/_search.py", line 1484, in _run_search
random_state=self.random_state))
File "/department/jupyter-dev/anaconda3/envs/python36/lib/python3.6/site-packages/sklearn/model_selection/_search.py", line 689, in evaluate_candidates
cv.split(X, y, groups)))
File "/department/jupyter-dev/anaconda3/envs/python36/lib/python3.6/site-packages/joblib/parallel.py", line 1004, in __call__
if self.dispatch_one_batch(iterator):
File "/department/jupyter-dev/anaconda3/envs/python36/lib/python3.6/site-packages/joblib/parallel.py", line 835, in dispatch_one_batch
self._dispatch(tasks)
File "/department/jupyter-dev/anaconda3/envs/python36/lib/python3.6/site-packages/joblib/parallel.py", line 754, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/department/jupyter-dev/anaconda3/envs/python36/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 209, in apply_async
result = ImmediateResult(func)
File "/department/jupyter-dev/anaconda3/envs/python36/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 590, in __init__
self.results = batch()
File "/department/jupyter-dev/anaconda3/envs/python36/lib/python3.6/site-packages/joblib/parallel.py", line 256, in __call__
for func, args, kwargs in self.items]
File "/department/jupyter-dev/anaconda3/envs/python36/lib/python3.6/site-packages/joblib/parallel.py", line 256, in <listcomp>
for func, args, kwargs in self.items]
File "/department/jupyter-dev/anaconda3/envs/python36/lib/python3.6/site-packages/sklearn/model_selection/_validation.py", line 508, in _fit_and_score
X_train, y_train = _safe_split(estimator, X, y, train)
File "/department/jupyter-dev/anaconda3/envs/python36/lib/python3.6/site-packages/sklearn/utils/metaestimators.py", line 201, in _safe_split
X_subset = _safe_indexing(X, indices)
File "/department/jupyter-dev/anaconda3/envs/python36/lib/python3.6/site-packages/sklearn/utils/__init__.py", line 390, in _safe_indexing
indices_dtype = _determine_key_type(indices)
File "/department/jupyter-dev/anaconda3/envs/python36/lib/python3.6/site-packages/sklearn/utils/__init__.py", line 288, in _determine_key_type
raise ValueError(err_msg)
ValueError: No valid specification of the columns. Only a scalar, list or slice of all integers or all strings, or boolean mask is allowed
Closing connection _sid_b8c1 at exit
H2O session _sid_b8c1 closed.
我正在使用带有 sklearn 0.22.1 和 h2o 3.28.0.3 的 python 3.6.10。
我究竟做错了什么?任何帮助表示赞赏!
祝你有美好的一天 :)
解决方案
推荐阅读
- mysql - 如何从多个数据库中调用信息并将其显示在单个页面上?
- scala - 我可以使用 memoization 来缓存来自 spark 作业的 hbase 读写数据吗?
- python - python 3中具有多个输入的函数
- swiftui - SwiftUI 不使用 UIViewRepresentable 和 ObservableObject 刷新我的自定义 UIView
- c# - 如何仅在 Tesseract C# 中捕获数字
- ti-basic - 试图显示矩阵编辑器的位置
- linkedin - LinkedIn 访问基本资料
- java - 如果增加迭代次数,顺序流比并行流快
- json - 我需要使用 powershell 从 Json 文件中获取特定值
- php - 使用 zend_call_function() 时的段错误