python - 分类指标无法处理二进制和未知目标的混合,如何忽略未知目标而只考虑整数?
问题描述
我正在解决一个分类问题,我试图根据同一输入文件中剩余列的值来预测输入文件的第一列“gold”。我的输入文件格式如下:
gold, callersAtLeast1T, CalleesAtLeast1T, ...
T,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
N,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
N,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
N,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
我正在使用概率进行预测,如果我的条件不满足,我选择不进行预测。换句话说,如果if (probs[i][0]>=0.8) & (probs[i][1]<0.8):
或(probs[i][0]<0.8) & (probs[i][1]>=0.8):
不满意,那么我选择离开y_pred
等于“无”并且不分配给它“0”或“1”。ValueError: Classification metrics can't handle a mix of binary and unknown targets
由于代码行,我收到错误print('confusion matrix\n',confusion_matrix(y_test,y_pred))
。原因是它y_pred
同时包含字符串和整数,其内容y_pred
由“0”、“1”和“None”组成。我想忽略所有y_pred
等于“无”的情况,只执行计算,print('confusion matrix\n',confusion_matrix(y_test,y_pred))
以防万一y_pred
等于“1”或“0”,跳过所有等于“无”的情况并忽略它们.
import pandas as pd
import numpy as np
dataset = pd.read_csv( 'data1extended.txt', sep= ',')
#convert T into 1 and N into 0
dataset['gold'] = dataset['gold'].astype('category').cat.codes
print(dataset.head())
row_count, column_count = dataset.shape
X = dataset.iloc[:, 1:column_count].values
y = dataset.iloc[:, 0].values
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
from sklearn.ensemble import RandomForestClassifier
regressor = RandomForestClassifier(n_estimators=200, random_state=0)
regressor.fit(X_train, y_train)
probs = regressor.predict_proba(X_test)
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
i=0
y_pred=[None]*len(y_test)
for i in range(len(probs)):
#print('i==> ',i)
if (probs[i][0]>=0.8) & (probs[i][1]<0.8):
y_pred[i]=0
elif (probs[i][0]<0.8) & (probs[i][1]>=0.8):
y_pred[i]=1
print(y_pred[i])
print("Probabilities=%s, Predicted=%s" % (probs[i], y_pred[i]))
print(y_pred)
print('confusion matrix\n',confusion_matrix(y_test,y_pred))
print('classification report\n', classification_report(y_test,y_pred))
print('accuracy score', accuracy_score(y_test, y_pred))
from sklearn import metrics
print('Mean Absolute Error:', metrics.mean_absolute_error(y_test, y_pred))
print('Mean Squared Error:', metrics.mean_squared_error(y_test, y_pred))
print('Root Mean Squared Error:', np.sqrt(metrics.mean_squared_error(y_test, y_pred)))
解决方案
推荐阅读
- html - 网站使用带有 www 和非 www 子域的 URL。这可能会导致重复的内容和错误的链接
- android - 将 TextInputLayout 与 Github 存储库一起使用时出错
- python - os.path.join 在 python 中创建文件名,日期时间在 mac 中不起作用
- javascript - 应用脚本 Shopify GraphQL 请求 - 400 响应
- javascript - 如何仅使用 CSS 和 JS 在卡片内显示更长的名称
- android - 切换到 Android Studio 中的另一个选项卡时 Gradle 自动同步(已禁用自动同步选项)
- flutter - 颤动的网页:InteractiveViewer
- android - 位置未在 10 分钟间隔内更新
- r - Add_annotations 绘制第一个和最后一个数据点
- html - 为什么悬停效果不适用于 Unicons 图标库?