首页 > 解决方案 > 我在使用逻辑回归实现降雨预测分类模型时遇到错误

问题描述

我的代码:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import train_test_split

dataset = pd.read_csv('weatherAUS.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values

imputer = SimpleImputer(missing_values=np.nan, strategy='mean')
imputer.fit(X[:, 1:15])
X[:, 1:15] = imputer.transform(X[:, 1:15])
    
ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [0])], remainder='passthrough')
X = np.array(ct.fit_transform(X))
    
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)

我得到的错误:

TypeError                                 Traceback (most recent call last)
<ipython-input-16-c8b4cceb3113> in <module>()
      1 from sklearn.model_selection import train_test_split
----> 2 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)

4 frames
/usr/local/lib/python3.7/dist-packages/sklearn/utils/validation.py in _num_samples(x)
    150         if len(x.shape) == 0:
    151             raise TypeError("Singleton array %r cannot be considered"
--> 152                             " a valid collection." % x)
    153         # Check that shape is returning an integer or default to len
    154         # Dask dataframes may not return numeric shape[0] value

TypeError: Singleton array array(<145460x63 sparse matrix of type '<class 'numpy.float64'>'
    with 1961771 stored elements in Compressed Sparse Row format>,
      dtype=object) cannot be considered a valid collection.

数据集比较大(140000行,17列)。第一列是澳大利亚某个地方的位置,因此我必须使用单热编码对其进行编码。有很多黑色单元格,所以我不得不清理数据。

我的数据集的链接在这里

标签: pythonmachine-learningscikit-learnclassification

解决方案


推荐阅读