首页 > 解决方案 > 分类指标无法处理二进制和未知目标的混合,如何忽略未知目标而只考虑整数?

问题描述

我正在解决一个分类问题,我试图根据同一输入文件中剩余列的值来预测输入文件的第一列“gold”。我的输入文件格式如下:

gold, callersAtLeast1T, CalleesAtLeast1T, ...

T,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

N,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

N,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

N,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

我正在使用概率进行预测,如果我的条件不满足,我选择不进行预测。换句话说,如果if (probs[i][0]>=0.8) & (probs[i][1]<0.8):(probs[i][0]<0.8) & (probs[i][1]>=0.8):不满意,那么我选择离开y_pred等于“无”并且不分配给它“0”或“1”。ValueError: Classification metrics can't handle a mix of binary and unknown targets由于代码行,我收到错误print('confusion matrix\n',confusion_matrix(y_test,y_pred))。原因是它y_pred同时包含字符串和整数,其内容y_pred由“0”、“1”和“None”组成。我想忽略所有y_pred等于“无”的情况,只执行计算,print('confusion matrix\n',confusion_matrix(y_test,y_pred))以防万一y_pred等于“1”或“0”,跳过所有等于“无”的情况并忽略它们.

import pandas as pd
import numpy as np
dataset = pd.read_csv( 'data1extended.txt', sep= ',') 
#convert T into 1 and N into 0
dataset['gold'] = dataset['gold'].astype('category').cat.codes

print(dataset.head())
row_count, column_count = dataset.shape
X = dataset.iloc[:, 1:column_count].values
y = dataset.iloc[:, 0].values

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
# Feature Scaling
from sklearn.preprocessing import StandardScaler

sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

from sklearn.ensemble import RandomForestClassifier

regressor = RandomForestClassifier(n_estimators=200, random_state=0)
regressor.fit(X_train, y_train)

probs = regressor.predict_proba(X_test)

from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
i=0
y_pred=[None]*len(y_test)
for i in range(len(probs)):
        #print('i==> ',i)
        if (probs[i][0]>=0.8) & (probs[i][1]<0.8):
               y_pred[i]=0

        elif (probs[i][0]<0.8) & (probs[i][1]>=0.8):
               y_pred[i]=1
        print(y_pred[i])
        print("Probabilities=%s, Predicted=%s" % (probs[i], y_pred[i]))
print(y_pred)
print('confusion matrix\n',confusion_matrix(y_test,y_pred))
print('classification report\n', classification_report(y_test,y_pred))
print('accuracy score', accuracy_score(y_test, y_pred))
from sklearn import metrics

print('Mean Absolute Error:', metrics.mean_absolute_error(y_test, y_pred))
print('Mean Squared Error:', metrics.mean_squared_error(y_test, y_pred))
print('Root Mean Squared Error:', np.sqrt(metrics.mean_squared_error(y_test, y_pred)))

标签: pythoncompiler-errorsrandom-forestprediction

解决方案


推荐阅读