首页 > 解决方案 > 随机森林算法的准确度为 0.0

问题描述

我正在使用 Jupyter notebook 做一个机器学习项目。我正在使用随机森林GridSearchCV,执行工作正常,但我得到了 Accuracy = 0.0

当我尝试决策树时,准确度 = 99.99

我该如何解决这个问题?

输入

#Training the RandomForest Algorithm


from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification

rfc=RandomForestClassifier(random_state=42)

param_grid = { 
    'n_estimators':  [50, 100, 200],
     'max_depth' : [5, 10, 20],
    'min_samples_leaf': [1, 2, 3, 4, 5, 10, 20]
 }
CV_rfc = GridSearchCV(estimator=rfc, param_grid=param_grid, cv= 5)
CV_rfc.fit(X_train, y_train)
CV_rfc.best_params_
rfc1=RandomForestClassifier(random_state=42,  n_estimators= 50, max_depth=5, criterion='gini')
rfc1.fit(X_train, y_train)

这给出了一个输出:

RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=5, max_features='auto', max_leaf_nodes=None,
            min_impurity_split=1e-07, min_samples_leaf=1,
            min_samples_split=2, min_weight_fraction_leaf=0.0,
            n_estimators=50, n_jobs=1, oob_score=False, random_state=42,
            verbose=0, warm_start=False)

输入:

pred=rfc1.predict(X_test)

print("Accuracy for Random Forest on CV data: ",accuracy_score(y_test,pred))

输出:

CV 数据上随机森林的准确度:0.0

输入 :

'''
Compute confusion matrix and print classification report.
'''
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score


# score the model
Ntest    = len(y_test)
Ntestpos = len([val for val in y_test if val])
NullAcc  = float(Ntest-Ntestpos)/Ntest
print("Mean accuracy on Training set: %s" %rfc1.score(X_train, y_train))
print("Mean accuracy on Test set:     %s" %rfc1.score(X_test, y_test))
print("Null accuracy on Test set:     %s" %NullAcc)
print(" ")
y_pred = rfc1.predict(X_test)
f1_score(y_test, y_pred, average='weighted')
y_true, y_pred = y_test, rfc1.predict(X_test)
cm             = confusion_matrix(y_true, y_pred)
print("Confusion matrix:\ntn=%6d  fp=%6d\nfn=%6d  tp=%6d" %(cm[0][0],cm[0][1],cm[1][0],cm[1][1]))
print("\nDetailed classification report: \n%s" %classification_report(y_true, y_pred))

输出:

训练集的平均准确率:1.0

测试集的平均准确度:0.0

测试集的空精度:0.0

带有那个错误 UndefinedMetricWarning: F-score 定义不明确,在没有预测样本的标签中设置为 0.0。'precision', 'predicted', average, warn_for) UndefinedMetricWarning:F 分数定义不明确,在没有预测样本的标签中设置为 0.0。“精度”、“预测”、平均值、warn_for)

Confusion matrix:
tn=     0  fp=     0
fn=1745395  tp=     0

Detailed classification report: 
             precision    recall  f1-score   support

          0       0.00      0.00      0.00         0
          1       0.00      0.00      0.00   1745395
          2       0.00      0.00      0.00    143264
          3       0.00      0.00      0.00     75044
          4       0.00      0.00      0.00     46700
          5       0.00      0.00      0.00     31568
          6       0.00      0.00      0.00     22966
          7       0.00      0.00      0.00     16903
          8       0.00      0.00      0.00     13188
          9       0.00      0.00      0.00     10160
                 .
                 .
                 .
        119       0.00      0.00      0.00         2
        123       0.00      0.00      0.00         2
        124       0.00      0.00      0.00         1
        141       0.00      0.00      0.00         1
        165       0.00      0.00      0.00         1

avg / total       0.00      0.00      0.00   2148603

标签: algorithmmachine-learningscikit-learnjupyter-notebookrandom-forest

解决方案


推荐阅读