python - Scikit 学习决策树不是确定性的
问题描述
我正在进行递归特征消除和交叉验证选择 (RFECV) 以获得最佳数量的特征。由于我将在稍后阶段比较处理不平衡数据的不同超参数和方法,我希望最好的特征是确定性的。因此,我使用了决策树。但是,每次我运行下面的代码时,我都会得到一个不同的号码。的最佳功能。我一直使用恒定的随机状态,无法理解为什么运行之间的结果不同?
RANDOM_ST = 123
def featureSelection(train, train_labels, test, test_labels):
# Use kNN to illustrate effectiveness of feature selection.
clf = KNeighborsClassifier()
# train the classifier
clf = clf.fit(train, train_labels['gname_code'])
# predict the class for unseen examples
preds = clf.predict(test)
# initial accuracy
score = metrics.accuracy_score(preds, test_labels['gname_code'])
print('Initial Result', score)
# Decision tree for feature selection
# RF is probably a better way to do feature selection but I want it to be deterministic for
# comparing unblanaced methods later. So use decTree instead
#estimator = RandomForestClassifier(n_estimators=100, max_depth=2, random_state=RANDOM_ST)
estimator = DecisionTreeClassifier(random_state=RANDOM_ST)
# Custom cv so I can seed with random state => results are comparable between different options later
rskv = model_selection.RepeatedStratifiedKFold(n_splits=5, n_repeats=5, random_state=RANDOM_ST)
# Greedy Feature Selection
rfecv= RFECV(estimator, cv=rskv, n_jobs=-1)
rfecv.fit(train, train_labels['gname_code'])
# optimal number of features
print('Optimal no. of features is: ', rfecv.n_features_)
# drop the un-informative features
train = train.iloc[:, rfecv.support_]
test = test.iloc[:, rfecv.support_]
# test again now
clf = KNeighborsClassifier()
clf = clf.fit(train, train_labels['gname_code'])
preds = clf.predict(test)
score = metrics.accuracy_score(preds, test_labels['gname_code'])
print ('Result after feature selection: ', score)
return train, train_labels, test, test_labels
解决方案
推荐阅读
- oracle - 调用时表单失败
- html - Bootstrap 社交按钮背景颜色和悬停问题
- azure-devops - Azure DevOps 2019 On-Prem - 创建集合时出错:“LeaseLostException on step Create Framework Security Namespaces”
- flutter - 如何使用 Provider 作为状态管理从列表中正确设置选定的小部件?
- c++ - 检查 gcc 4.4.7 中是否未初始化 atomic ptr(没有 nullptr)
- java - 使用 JAXB/JAX-RS 将 XML 元素包装在子元素中
- pdf - 用于使用 PDF 模板的 itext PDF 库
- reactjs - React.js:如何在多个相同的孩子之一中设置一个活动标志?
- kubernetes - 如何从基于 http 的服务器为 Kubernetes Engine 在 Google Cloud 中创建 https 端点?
- python - Pandas eval - 在列上调用用户定义的函数