首页 > 解决方案 > 预测逻辑回归时出错

问题描述

我目前正在做一个预测二进制输出的项目。我从http://hamelg.blogspot.com/2015/11/python-for-data-analysis-part-28.htmlhttps://towardsdatascience.com/pca-using-python-scikit-获得了我的代码资源learn-e653f8989e60,其中目标是,基于训练数据集“train_dataset02_process.csv”,预测 test_dataset 中的每个观察结果,“test_dataset.csv”是否每个观察结果是是或否。训练数据集有 18123 个观测值,而测试数据集有 9679 个观测值。

如代码所示,有一些变量会影响结果。我使用 PCA 将这些变量聚集在一起,然后使用逻辑回归来预测结果。然而,最后一行代码没有运行并不断给我一个错误:

ValueError: X has 43 features per sample; expecting 13

所以,我在这里有我的代码,

import numpy as np
import pandas as pd
df_train = pd.read_csv('train_dataset02_process.csv')
df1 = pd.read_csv("test_dataset.csv")

from sklearn.preprocessing import StandardScaler

#Variables affecting outcome
LEVEL_T1 = df_train['LEVEL_T1']
LEVEL_T2 = df_train['LEVEL_T2']
LEVEL_T3 = df_train['LEVEL_T3']
LEVEL_T4 = df_train['LEVEL_T4']
LEVEL_T5 = df_train['LEVEL_T5']
LEVEL_T6 = df_train['LEVEL_T6']
LEVEL_T7 = df_train['LEVEL_T7']
PRESSURE_J280 = df_train['PRESSURE_J280']
PRESSURE_J269 = df_train['PRESSURE_J269']
PRESSURE_J300 = df_train['PRESSURE_J300']
PRESSURE_J256 = df_train['PRESSURE_J256']
PRESSURE_J289 = df_train['PRESSURE_J289']
PRESSURE_J415 = df_train['PRESSURE_J415']
PRESSURE_J302 = df_train['PRESSURE_J302']
PRESSURE_J306 = df_train['PRESSURE_J306']
PRESSURE_J307 = df_train['PRESSURE_J307']
PRESSURE_J317 = df_train['PRESSURE_J317']
PRESSURE_J14 = df_train['PRESSURE_J14']
PRESSURE_J422 = df_train['PRESSURE_J422']
FLOW_PU1 = df_train['FLOW_PU1']
FLOW_PU2 = df_train['FLOW_PU2']
FLOW_PU3 = df_train['FLOW_PU3']
FLOW_PU4 = df_train['FLOW_PU4']
FLOW_PU5 = df_train['FLOW_PU5']
FLOW_PU6 = df_train['FLOW_PU6']
FLOW_PU7 = df_train['FLOW_PU7']
FLOW_PU8 = df_train['FLOW_PU8']
FLOW_PU9 = df_train['FLOW_PU9']
FLOW_PU10 = df_train['FLOW_PU10']
FLOW_PU11 = df_train['FLOW_PU11']
FLOW_V2 = df_train['FLOW_V2']
STATUS_PU1 = df_train['STATUS_PU1']
STATUS_PU2 = df_train['STATUS_PU2']
STATUS_PU3 = df_train['STATUS_PU3']
STATUS_PU4 = df_train['STATUS_PU4']
STATUS_PU5 = df_train['STATUS_PU5']
STATUS_PU6 = df_train['STATUS_PU6']
STATUS_PU7 = df_train['STATUS_PU7']
STATUS_PU8 = df_train['STATUS_PU8']
STATUS_PU9 = df_train['STATUS_PU9']
STATUS_PU10 = df_train['STATUS_PU10']
STATUS_PU11 = df_train['STATUS_PU11']
STATUS_V2 = df_train['STATUS_V2']
features = pd.DataFrame([LEVEL_T1,LEVEL_T2,LEVEL_T3,LEVEL_T4,LEVEL_T5,LEVEL_T6,LEVEL_T7,PRESSURE_J280,PRESSURE_J269,PRESSURE_J300,  PRESSURE_J256,PRESSURE_J289,PRESSURE_J415,PRESSURE_J302,PRESSURE_J306,PRESSURE_J307,PRESSURE_J317,PRESSURE_J14,PRESSURE_J422,FLOW_PU1,FLOW_PU2,FLOW_PU3,FLOW_PU4,FLOW_PU5,FLOW_PU6,FLOW_PU7,FLOW_PU8,FLOW_PU9,FLOW_PU10,FLOW_PU11,FLOW_V2,STATUS_PU1,STATUS_PU2,STATUS_PU3,STATUS_PU4,STATUS_PU5,STATUS_PU6,STATUS_PU7,STATUS_PU8,STATUS_PU9,STATUS_PU10,STATUS_PU11,STATUS_V2])

scaler = StandardScaler()

scaler.fit(df_train)
train_img = scaler.transform(df_train)

from sklearn.decomposition import PCA
# Make an instance of the Model
pca = PCA(.95)
pca.fit(train_img)
train_img = pca.transform(train_img)

from sklearn import linear_model
from sklearn import preprocessing
log_model = linear_model.LogisticRegression()

from sklearn.linear_model import LogisticRegression
logisticRegr = LogisticRegression(solver = 'lbfgs')
logisticRegr.fit(train_img, df_train['ATT_FLAG'])

test_preds = logisticRegr.predict(df1[0:18122])

谁能给我有关如何解决问题的任何见解?谢谢!

标签: python

解决方案


推荐阅读