首页 > 解决方案 > StatsModels api 上的交叉验证

问题描述

我得到以下代码来获得逻辑回归的分数。应用 Sklearn 包导致混淆矩阵中只有 FP 和 TN,所以我应用了 statsmodel。


X = df.iloc[:,:-3]
y = df['Direction']
model = sm.Logit(y,X)
result = model.fit()

prediction = result.predict(X)
def confusion_matrix(act,pred):
    predtrans = ['Up' if i > 0.5 else "Down" for i in pred]
    actuals = ['Up' if i > 0 else "Down" for i in act]
    confusion_matrix = pd.crosstab(pd.Series(actuals), 
                                   pd.Series(predtrans),
                                  rownames=['Actual'],
                                  colnames=['Predicted'])
    return confusion_matrix

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=0)

model = sm.Logit(y_train, X_train)
result = model.fit

df_cm = confusion_matrix(y_test,prediction)
df_cm


混淆矩阵有效,我能够计算出以下分数:

Accuracy = (df_cm.loc['Down'][0]+df_cm.loc['Up'][1])/len(X_test)
Precision = (df_cm.loc['Down'][0])/(((df_cm.loc['Down'][0])+df_cm.loc['Down'][1]))
Recall = (df_cm.loc['Down'][0]/(df_cm.loc['Down'][0]+df_cm.loc['Up'][0]))
F_Measure = (2*Precision*Recall)/(Precision+Recall)

logreg_scores = {"Model": ["LogReg"],
                 "Accuracy": [Accuracy],
                 "Precision": [Precision],
                 "Recall": [Recall],
                 "F1": [F_Measure]}

df_scores_logreg = pd.DataFrame(logreg_scores)
df_scores_logreg

现在我想应用交叉验证。使用 Sklearn 包会导致结果似乎不正确:


Accuracy_cross_val = cross_val_score(LogisticRegression(), X, y, scoring='accuracy', cv=10)
Precision_cross_val = cross_val_score(LogisticRegression(), X, y, scoring='precision', cv=10)
Recall_cross_val = cross_val_score(LogisticRegression(), X, y, scoring='recall', cv=10)
F1_cross_val = cross_val_score(LogisticRegression(), X, y, scoring='f1', cv=10)


LogReg_cv_scores = {"Model": ["LogReg_CV"],
              "Accuracy": [Accuracy_cross_val.mean()],
              "Precision": [Precision_cross_val.mean()],
              "Recall": [Recall_cross_val.mean()],
              "F1": [F1_cross_val.mean()]}

df_scores_LogReg_cv = pd.DataFrame(LogReg_cv_scores)
df_scores_LogReg_cv

如何应用交叉验证?

标签: logistic-regressionstatsmodelscross-validationconfusion-matrix

解决方案


推荐阅读