首页 > 解决方案 > 出现错误:使用数组重塑数据

问题描述

我正在使用 SKlearn 学习线性回归,但我不断收到此错误:

import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
%matplotlib inline

mydf = pd.read_csv("Salary_Data.csv")

X = np.array(mydf["YearsExperience"])
Y = np.array(mydf["Salary"])

xtrain, xtest, ytrain, ytest = train_test_split(X, Y, test_size=0.2)

lr = LinearRegression()
lr.fit(xtrain,ytrain)  ##HERE AN ERROR ARISES

错误是:

ValueError: Expected 2D array, got 1D array instead:
array=[ 4.   2.2  2.9  8.2 10.5  3.   4.9  1.5  5.1  4.  10.3  4.1  3.2  2.
  9.6  6.   7.9  7.1  3.2  8.7  6.8  9.5  3.9  1.1].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

你能帮助我吗?

标签: pythonpandasscikit-learnlinear-regression

解决方案


我建议你Reshape your data either using array.reshape(-1, 1)

lr.fit(xtrain,ytrain.reshape(-1, 1))

Scikit-Learn 需要这样的输入:

array([[0],
       [1],
       [2],
       [3]])

不像这样:

array([0, 1, 2, 3])

就是这样。


推荐阅读