首页 > 解决方案 > Getting Precision and Recall 为 0,准确率在 98% 左右

问题描述

我正在研究来自张量流估计的 DNNClassifier,使用的数据集是 JM1(缺陷预测)

考虑的训练特征0:8163(defects-free:6056, defects: 2106)

考虑验证功能8163:9796(defects-free:1634, defects: 0)

其余功能用于测试。其中总特征为 10885。

我在经过验证的数据集上获得的评估指标是:

 'accuracy': 0.97917944,
 'accuracy_baseline': 1.0,
 'auc': 1.0,
 'auc_precision_recall': 0.0,
 'average_loss': 0.27983573,
 'label/mean': 0.0,
 'loss': 35.151672,
 'precision': 0.0,
 'prediction/mean': 0.22930107,
 'recall': 0.0,
 'global_step': 332261

由于我认为数据集不平衡,我的精度和召回率为 0。

我的代码附在这里,任何人都可以解决如何解决数据集不平衡的问题。或者指定有关我的代码的原因。

import tensorflow as tf
import numpy as np
import pandas as pd
import os
import shutil

dataset = pd.read_csv('jm_missing_removed.csv')
dataset = dataset.iloc[:,0:22]

CSV_COLUMNS = ['loc','vg','evg','ivg','n','v','l','d','i','e','b','t','lOCode','lOComment','lOBlank','locCodeAndComment','uniq_Op','uniq_Opnd','total_Op','total_Opnd','branchCount','defects'
]

FEATURES = CSV_COLUMNS[0:len(CSV_COLUMNS) - 1]
LABEL = CSV_COLUMNS[21]

def make_feature_cols():
     input_columns = [tf.feature_column.numeric_column(k) for k in FEATURES]
     return input_columns

feature_columns = make_feature_cols()
feature_columns

tf.logging.set_verbosity(tf.logging.INFO)

# To save the trained model
OUTDIR = './logs/breastCancer_trained'
shutil.rmtree(OUTDIR, ignore_errors = True) 

myopt = tf.train.FtrlOptimizer(learning_rate = 0.01)

model = tf.estimator.DNNClassifier(feature_columns = make_feature_cols(), 
                                   model_dir = OUTDIR, hidden_units=[10, 10], 
                                   n_classes=2, optimizer = myopt,
                                   activation_fn = tf.nn.relu)

def make_input_fn(df, num_epochs):
  return tf.estimator.inputs.pandas_input_fn(
    x = df,
    y = df[LABEL],
    num_epochs = num_epochs,
    shuffle = True,
    num_threads = 1
  )

model.train(input_fn = make_input_fn(df_train, num_epochs = 10))

ev = model.evaluate(input_fn = make_input_fn(df_eval, num_epochs = 1))

任何更简单的解决方案将不胜感激。

标签: pythontensorflowtensorflow-estimator

解决方案


使用 K-FOLD 方法和 ADASYN 的上采样将给出更好的结果


推荐阅读