首页 > 解决方案 > 循环,每次迭代都会用零替换数据框中的一列

问题描述

我想在 Python 中对分类模型执行敏感性分析。

所以我想检查缺少每一列将如何影响指标。我准备了从原始测试集中返回指标的函数。

def score_metrics(model, 
                  X_test,
                  y_test):

y_pred = model.predict(X_test) #predicted values from oryginal dataset

cm_orig = confusion_matrix(y_test, y_pred)

tp = cm_orig[1, 1]
fp = cm_orig[0, 1]
fn = cm_orig[1, 0]
tn = cm_orig[0, 0]

score_orig_precision = precision_score(y_test, y_pred)
score_orig_accuracy = accuracy_score(y_test, y_pred)
score_orig_recall = recall_score(y_test, y_pred)
score_orig_specificity = tn/(tn+fp)
score_orig_F1 = f1_score(y_test, y_pred)

results = {'Feature': 'orginal',
           'Precision': score_orig_precision,
           'Accuracy': score_orig_accuracy,
           'Recall': score_orig_recall,
           'Specificity': score_orig_specificity,
           'F1 score': score_orig_F1}

return results

我想执行相同的操作,但对于 X_test,每次迭代都会将一列值替换为 0。

例如,如果这将是 X_test:

    A   B   C   D   E
    5   7   11  12  6
   11   32  11  13  6

我想检查这些指标:

    A   B   C   D   E
    0   7  11  12   6
    0  32  11  13   6

    A   B   C   D   E
    5   0  11  12   6
   11   0  11  13   6

    A   B   C   D   E
    5   7   0   12  6
   11   32  0   13  6

等等。我的问题是编辑上面的代码(或提出其他建议)以帮助我实现它。后来我想在 Pandas DataFrame 中得到这个结果,但这足以让我了解字典状态。

标签: pythonpandasdataframeloops

解决方案


因此,从您的示例中,您实际上不想删除该列,只需在迭代期间给出 0 值。然后你可以使用:

for c in df.columns:
    newDF = df.copy(deep=True)
    newDF[c] = 0
    # Here you opperate with the new DF in this instance

在现有代码中对此进行整数处理的一种选择:

def getting_results(y_pred, y_test):
    cm_orig = confusion_matrix(y_test, y_pred)

    tp = cm_orig[1, 1]
    fp = cm_orig[0, 1]
    fn = cm_orig[1, 0]
    tn = cm_orig[0, 0]

    score_orig_precision = precision_score(y_test, y_pred)
    score_orig_accuracy = accuracy_score(y_test, y_pred)
    score_orig_recall = recall_score(y_test, y_pred)
    score_orig_specificity = tn/(tn+fp)
    score_orig_F1 = f1_score(y_test, y_pred)

    results = {'Feature': 'orginal',
               'Precision': score_orig_precision,
               'Accuracy': score_orig_accuracy,
               'Recall': score_orig_recall,
               'Specificity': score_orig_specificity,
               'F1 score': score_orig_F1}

    return results


def score_metrics(model, X_test, y_test):

    for c in X_test.columns:
        newX_test = X_test.copy(deep=True)
        newX_test[c] = 0
        
        y_pred = model.predict(newX_test) #predicted values from oryginal dataset
        getting_results(y_pred, y_test)

推荐阅读