首页 > 解决方案 > 管道预处理步骤将模型得分值转换为 NaN

问题描述

我正在尝试评估一系列模型的分数,并建立了一个用于处理数据和拟合模型的管道。我的代码是:

    numeric_transformer = Pipeline(steps=[
        ('imputer', SimpleImputer(strategy='median')),
        ('scaler', StandardScaler())])

    categorical_transformer = Pipeline(steps=[
        ('encoder', LabelEncoder()),
        ('onehot', OneHotEncoder(handle_unknown='ignore'))])
    

    preprocessor = ColumnTransformer(transformers=[
        ('num', numeric_transformer, selector(dtype_include="number")),
        ('cat', categorical_transformer, selector(dtype_exclude="number"))
    ])


    scoring = {'acc': 'accuracy',
               'log_loss': 'neg_log_loss',
              'recall': 'recall',
              'auc':'roc_auc',
              'f1':'f1'}
    
     # define models
    models, names, accuracy, log_loss, recall, roc, f1 = list(), list(), list(), list(), list(), list(), list()
   
    # LR
    models.append(LogisticRegression(solver='liblinear'))
    names.append('LR')

    #Lasso
    models.append(LogisticRegression(solver='liblinear',penalty='l1'))
    names.append('LA')


    #RF
    models.append(RandomForestClassifier(random_state=42))
    names.append('RF')

for i in range(len(models)):
    # evaluate the model and store results
    cv=StratifiedKFold(n_splits=5, random_state=None)
    pipeline = Pipeline([ ('prep', preprocessor),('model', model)])
    scores = cross_validate(pipeline, X, y, scoring=scoring,
                             cv=cv, return_train_score=True, n_jobs=-1)
    accuracy=list(scores["test_acc"])
    log_loss=list(scores['test_log_loss'])
    recall=list(scores['test_recall'])
    auc=list(scores['test_auc'])
    f1=list(scores['test_f1'])
    # summarize and store
    print('{} accuracy:{} logloss:{}, recall:{}, auc:{}, f1:{}'.format(names[i], mean(accuracy), mean(log_loss), mean(recall), mean(auc), mean(f1)))

当我从管道中删除预处理步骤时,代码工作正常,但是当我包含它时,最后打印的所有分数都是“nan”(即使模型正在拟合,因为它节省了拟合时间)。知道这里发生了什么吗?

标签: pythonscikit-learnpipeline

解决方案


推荐阅读