python - 网格搜索 ValueError:估计器的参数分类器无效
问题描述
我正在尝试将随机森林与网格搜索一起使用,但出现此错误
ValueError: Invalid parameter classifier for estimator Pipeline(steps=[('tfidf_vectorizer', TfidfVectorizer()),
('rf_classifier', RandomForestClassifier())]).
Check the list of available parameters with `estimator.get_params().keys()`.
import numpy as np # linear algebra
import pandas as pd
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split
from sklearn import pipeline,ensemble,preprocessing,feature_extraction,metrics
train=pd.read_json('cleaned_data1')
#split dataset into X , Y
X=train.iloc[:,0]
Y=train.iloc[:,2]
estimators=pipeline.Pipeline([
('tfidf_vectorizer', feature_extraction.text.TfidfVectorizer(lowercase=True)),
('rf_classifier', ensemble.RandomForestClassifier())
])
print(estimators.get_params().keys())
params = {"classifier__max_depth": [3, None],
"classifier__max_features": [1, 3, 10],
"classifier__min_samples_split": [1, 3, 10],
"classifier__min_samples_leaf": [1, 3, 10],
# "bootstrap": [True, False],
"classifier__criterion": ["gini", "entropy"]}
X_train,X_test,y_train,y_test=train_test_split(X,Y, test_size=0.2)
rf_classifier=GridSearchCV(estimators,params, cv=10 , n_jobs=-1 ,scoring='accuracy',iid=True)
rf_classifier.fit(X_train,y_train)
y_pred=rf_classifier.predict(X_test)
metrics.confusion_matrix(y_test,y_pred)
print(metrics.accuracy_score(y_test,y_pred))
我试图添加这些参数
param_grid = {
'n_estimators': [200, 500],
'max_features': ['auto', 'sqrt', 'log2'],
'max_depth' : [4,5,6,7,8],
'criterion' :['gini', 'entropy']
}
但仍然是同样的错误
解决方案
请确保当您在管道中引用某些内容时,在初始化参数网格时使用相同的命名约定。
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn import datasets
from sklearn.decomposition import PCA
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV
# Define a pipeline to search for the best combination of PCA truncation
# and classifier regularization.
pca = PCA()
# set the tolerance to a large value to make the example faster
logistic = LogisticRegression(max_iter=10000, tol=0.1)
pipe = Pipeline(steps=[('pca', pca), ('logistic', logistic)])
X_digits, y_digits = datasets.load_digits(return_X_y=True)
# Parameters of pipelines can be set using ‘__’ separated parameter names:
param_grid = {
'pca__n_components': [5, 15, 30, 45, 64],
'logistic__C': np.logspace(-4, 4, 4),
}
search = GridSearchCV(pipe, param_grid, n_jobs=-1)
search.fit(X_digits, y_digits)
print("Best parameter (CV score=%0.3f):" % search.best_score_)
print(search.best_params_)
在此示例中,我们将 LogisticRegression 模型称为“logistic”。另外请注意,对于 RandomForestClassifiers,min_samples_split = 1 的值是不可能的,并且会导致错误。
推荐阅读
- react-native - 从抽屉导航切换底部标签
- django - Django 如何预加载 Fat Class
- subdomain - 子域显示位于不同主机上的特定网站的内容
- python - 没有渐变的抗锯齿?
- java - 我们可以从 java.security.KeyStore 对象及其密钥对和证书中提取特定别名,然后注入新的 java.security.KeyStore 对象吗?
- python - 强制 pipenv (重新)从 github 安装一个包,即使版本没有改变
- javascript - 如何从嵌入式 youtube 播放列表中禁用“相关视频”
- java - 将服务帐户(json 文件)位置传递给 application.properties
- wordpress - 为什么我在创建新页面时看不到我的自定义模板?
- javascript - Firebase 实时数据库不返回任何数据