首页 > 解决方案 > 如何使逻辑回归模型适用于其他文件在 Python 中进行预测?

问题描述

我为 train.csv 创建了一个逻辑回归模型,该模型使用其数据进行预测。如何使用相同的模型对 test.csv 进行预测?对不起,我对 Python 很陌生。

这是最后几个命令的屏幕截图及其 train.csv 的结果。当我想测试 test.csv 时,我在最后一句话中出现以下错误: https ://pasteboard.co/K4p4aZA.jpg

对于 test.csv、train.csv 和 anaconda 笔记本,您可以访问:https ://drive.google.com/drive/u/1/folders/1U6TcJz8fp7FqbxpUcqRmAU-HSL42VN-S

import pandas as pd
import numpy as np

df_test=pd.read_csv("test.csv")
df_train=pd.read_csv("train.csv")

# many lines in between for details please read the notebook in google drive
# below is the last few sentence

from sklearn.linear_model import LogisticRegression

lr=LogisticRegression()

lr.fit(X_train,y_train)

#prediction
df_result=pd.DataFrame(y_train)
df_result['predicted']=lr.predict_proba(X_train)[:,1]

标签: pythonpandaslogistic-regression

解决方案


我没有在谷歌驱动器中阅读你的实际代码,但通常你可以做这样的事情。

import pandas as pd
import numpy as np

df_test=pd.read_csv("test.csv")
df_train=pd.read_csv("train.csv")

# I would assume df_test and df_train have exactly the same structure, as it should be.

# You would have some process to clean up your input data
# Abstracted into a function below
def fun(df: pd.DataFrame):
  # Do your cleaning and stuff
  return x, y

x_train, y_train = fun(df_train)


from sklearn.linear_model import LogisticRegression
lr=LogisticRegression()

lr.fit(x_train,y_train )

#prediction
df_result=pd.DataFrame(y_train)
df_result['predicted']=lr.predict_proba(x_train)[:,1]

# Now you can do exactly the same thing on your `df_test`
x_test, y_test = fun(df_test)
test_result = lr.predict_proba(x_test)[:,1]

我建议阅读sklearn.Pipeline,它构建了一个Pipeline来处理预处理和实际建模。


推荐阅读