python-3.x - 熊猫的长期运行时间

问题描述

我正在为手动混淆矩阵编写程序。我必须循环超过 10K 次迭代。

df_a=df_a.sort_values('proba')
tpr_lst=[]
fpr_lst=[]
for i in tqdm(df_a['proba']): #df_a['proba'] contains 10K points, each point will be taken a new threshold to determine y_pred is 0 or 1, all this is too plot an ROC.
    def y_pred_auc(x):
        if x<i:
            return 0
        else:
            return 1
    df_a['y_pred_auc']=df_a['proba'].map(y_pred_auc)
    df_a['con_mat_label_auc']=df_a[['y','y_pred']].apply(confusion_matrix,axis=1)
    tp_count=len(df_a['con_mat_label_auc']=='TP')
    fp_count=len(df_a['con_mat_label_auc']=='FP')
    tn_count=len(df_a['con_mat_label_auc']=='TN')
    fn_count=len(df_a['con_mat_label_auc']=='FN')

    tpr_auc=tp_count/(tp_count+fn_count)
    fpr_auc=fp_count/(tn_count+fp_count)

    tpr_lst.append(tpr_auc)
    fpr_lst.append(fpr_auc)

即使在 c4 AWS Sagemaker 实例上，此代码也需要大约一个小时。无论如何要优化此代码，或者任何人都可以建议一个快速的 AWS Sagemaker 实例，我也尝试过 Colab，那里的情况更糟。

标签： python-3.xpandasamazon-web-services

python-3.x - 熊猫的长期运行时间

问题描述

解决方案

推荐阅读