python - 在 cross_validation.StratifiedKFold 弃用后使用 StratifiedKFold 进行分层交叉验证
问题描述
我正在关注 3 年前的一些示例脚本,并遇到了使用已弃用函数 (cross_validation.StratifiedKFold) 的函数定义。
这是 3 年前的原始代码片段:
def stratified_cv(X, y, clf_class, shuffle=True, n_folds=10, **kwargs):
stratified_k_fold = cross_validation.StratifiedKFold(y, n_folds=n_folds, shuffle=shuffle)
y_pred = y.copy()
# ii -> train
# jj -> test indices
for ii, jj in stratified_k_fold:
X_train, X_test = X[ii], X[jj]
y_train = y[ii]
clf = clf_class(**kwargs)
clf.fit(X_train,y_train)
y_pred[jj] = clf.predict(X_test)
return y_pred
我已经尝试按照有关 sklearn.model_selection.StratifiedKFold (https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedKFold.html)的一些文档来更新它,这就是我到目前为止所拥有的:
## Attempt to modernize with StratifiedKFold from sklearn.model_selection
def stratified_cv(X, y, clf_class, shuffle=True, n_splits=10, **kwargs):
stratified_k_fold = StratifiedKFold(n_splits=n_splits)
y_pred = y.copy()
# ii -> train
# jj -> test indices
for ii, jj in stratified_k_fold:
X_train, X_test = X[ii], X[jj]
y_train = y[ii]
clf = clf_class(**kwargs)
clf.fit(X_train,y_train)
y_pred[jj] = clf.predict(X_test)
return y_pred
然后我尝试运行以下块并遇到后续错误:
print('Gradient Boosting Classifier: {:.2f}'.format(metrics.accuracy_score(y, stratified_cv(X, y, ensemble.GradientBoostingClassifier))))
print('Support vector machine(SVM): {:.2f}'.format(metrics.accuracy_score(y, stratified_cv(X, y, svm.SVC))))
print('Random Forest Classifier: {:.2f}'.format(metrics.accuracy_score(y, stratified_cv(X, y, ensemble.RandomForestClassifier))))
print('K Nearest Neighbor Classifier: {:.2f}'.format(metrics.accuracy_score(y, stratified_cv(X, y, neighbors.KNeighborsClassifier))))
print('Logistic Regression: {:.2f}'.format(metrics.accuracy_score(y, stratified_cv(X, y, linear_model.LogisticRegression))))
错误:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-122-a61be22f8ca9> in <module>
----> 1 print('Gradient Boosting Classifier: {:.2f}'.format(metrics.accuracy_score(y, stratified_cv(X, y, ensemble.GradientBoostingClassifier))))
2 print('Support vector machine(SVM): {:.2f}'.format(metrics.accuracy_score(y, stratified_cv(X, y, svm.SVC))))
3 print('Random Forest Classifier: {:.2f}'.format(metrics.accuracy_score(y, stratified_cv(X, y, ensemble.RandomForestClassifier))))
4 print('K Nearest Neighbor Classifier: {:.2f}'.format(metrics.accuracy_score(y, stratified_cv(X, y, neighbors.KNeighborsClassifier))))
5 print('Logistic Regression: {:.2f}'.format(metrics.accuracy_score(y, stratified_cv(X, y, linear_model.LogisticRegression))))
<ipython-input-121-e373d74b2cca> in stratified_cv(X, y, clf_class, shuffle, n_splits, **kwargs)
5 # ii -> train
6 # jj -> test indices
----> 7 for ii, jj in stratified_k_fold:
8 X_train, X_test = X[ii], X[jj]
9 y_train = y[ii]
TypeError: 'StratifiedKFold' object is not iterable
解决方案
您需要使用 StratifiedKFold 来拆分数据,而无需过多地更改代码,下面应该可以工作:
from sklearn.model_selection import StratifiedKFold
from sklearn import datasets
from sklearn import metrics
from sklearn import svm
iris = datasets.load_iris()
X = iris.data
y = iris.target
def stratified_cv(X, y, clf_class, shuffle=True, n_splits=10, **kwargs):
stratified_k_fold = StratifiedKFold(n_splits=n_splits)
y_pred = y.copy()
for ii,jj in stratified_k_fold.split(X, y):
y_train = y[ii]
X_train, X_test = X[ii], X[jj]
clf = clf_class(**kwargs)
clf.fit(X_train,y_train)
y_pred[jj] = clf.predict(X_test)
return y_pred
print('Gradient Boosting Classifier: {:.2f}'.format(metrics.accuracy_score(y, stratified_cv(X, y, svm.SVC))))
推荐阅读
- python - 如何读取 .txt 文件,并在特定位置/索引后添加空格,为 python 中的每一行
- jenkins - Jenkins 负载统计图 - 缺少主执行器
- javascript - 如何使用jquery计算小计和总计并显示结果
- azure - 了解 Azure 容器注册表中的映像数量
- postgresql - 错误:运算符不存在:没有时区的时间戳 >= boolean 提示:没有运算符与给定的名称和参数类型匹配
- wxpython - EVT_BUTTON 在帧加载时触发
- mongodb - 不使用 mongoDB 每 6 小时间隔获取记录计数
- flutter - 如何更改无状态小部件的内容
- entity-framework - 使用 MSDTC 时 Windows 2019 上的实体框架异常
- c# - Normalize using a given NumPy array (From Python To C#)