首页 > 解决方案 > ValueError:发现样本数量不一致的输入变量:[11097, 1233]

问题描述

我正在尝试创建一个交易网站并使用 sklearn 来识别 python 3.6 中的交易模式。我从网站上获取数据,如下所示:

buyer = "FORM54"
getBuyer = requests.get("https://url.com/buyer=%s/" % buyer)

然后使用 pandas 获取我需要的数据:

data = pd.read_json(StringIO(getBuyer.text))
data = data[["strike_price", "underlying_price", "notional_amount", "quantity"]]

并尝试像这样预测交易数量:

predict = "quantity"
X = np.array(data.drop([predict],1))
y = np.array(data[predict])

x_train, y_train, x_test, y_test = sklearn.model_selection.train_test_split(X, y, test_size =  0.1)

但是我收到此错误:

Traceback (most recent call last):
  File "C:/Users/HP Omen/PycharmProjects/untitled2/main.py", line 31, in <module>
    model.fit(x_train, y_train)
  File "C:\Users\HP Omen\PycharmProjects\untitled2\venv\lib\site-packages\sklearn\neighbors\_base.py", line 1130, in fit
    X, y = check_X_y(X, y, "csr", multi_output=True)
  File "C:\Users\HP Omen\PycharmProjects\untitled2\venv\lib\site-packages\sklearn\utils\validation.py", line 765, in check_X_y
    check_consistent_length(X, y)
  File "C:\Users\HP Omen\PycharmProjects\untitled2\venv\lib\site-packages\sklearn\utils\validation.py", line 212, in check_consistent_length
    " samples: %r" % [int(l) for l in lengths])
ValueError: Found input variables with inconsistent numbers of samples: [11097, 1233]

这是 X 和 y 形状:

>>> X.shape
(12330, 3)
>>> y.shape
(12330,)

然而,当改变时它会buyer改变

标签: pythonpandasscikit-learnpython-3.6

解决方案


尝试:

x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y, test_size =  0.1)

看看它是否能解决你的问题。


推荐阅读