首页 > 解决方案 > 当我的所有输入都是整数时,为什么我的线性回归模型会给我错误

问题描述

我想在我的数据集上尝试所有回归算法并选择一个最好的。我决定从线性回归开始。但我得到一些错误。我尝试进行缩放,但也遇到了另一个错误。

这是我的代码:

import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression

train_df = pd.read_csv('train.csv', index_col='ID')
train_df.head()
target = 'Result'

X = train_df.drop(target, axis=1)
y = train_df[target]

# Trying to scale and get even worse error
#ss = StandardScaler()
#df_scaled = pd.DataFrame(ss.fit_transform(train_df),columns = train_df.columns)
#X = df_scaled.drop(target, axis=1)
#y = df_scaled[target]

model = LogisticRegression() 
model.fit(X, y) 

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=10000,
                   multi_class='auto', n_jobs=None, penalty='l2',
                   random_state=None, solver='lbfgs', tol=0.0001, verbose=10,
                   warm_start=False)
                   

print(X.iloc[10])
print(model.predict([X.iloc[10]]))
print(y[10])

这是一个错误:

ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
A     0
B   -19
C   -19
D   -19
E     0
F   -19
Name: 10, dtype: int64
[0]
-19

这是数据集的示例:

ID,A,B,C,D,E,F,Result
0,-18,18,18,-2,-12,-3,-19
1,-19,-8,0,18,18,1,0
2,0,-11,18,0,-19,18,18
3,18,-15,-12,18,-11,-4,-17
4,-17,18,-11,-17,-18,-19,18
5,18,-14,-19,-14,-15,-19,18
6,18,-17,18,18,18,-2,-1
7,-1,-11,0,18,18,18,18
8,18,-19,-18,-19,-19,18,18
9,18,18,0,0,18,18,0
10,0,-19,-19,-19,0,-19,-19
11,-19,0,-19,18,-19,-19,-6
12,-6,18,0,0,0,18,-15
13,-15,-19,-6,-19,-19,0,0
14,0,-15,0,18,18,-19,18
15,18,-19,18,-8,18,-2,-4
16,-4,-4,18,-19,18,18,18
17,18,0,18,-4,-10,0,18
18,18,0,18,18,18,18,-19

我做错了什么?

标签: machine-learningkerasscikit-learnlinear-regression

解决方案


您正在使用 LogisticRegression,它是用于分类因变量的线性回归的一种特殊情况。

这不一定是错误的,因为您可能打算这样做,但这意味着您需要每个类别有足够的数据和足够的迭代以使模型收敛(您的错误指出,它还没有完成)。

但是,我怀疑您打算使用的是sklearn库中的 LinearRegression(用于连续因变量)。


推荐阅读