python - 获取线性回归模型的值错误
问题描述
我正在努力完成 Kaggle 的泰坦尼克号比赛。在尝试将线性回归模型应用于我的代码并检查其准确性分数时,我在 Pycharm 上收到以下错误:
Traceback (most recent call last):
File "C:/Users/security/Downloads/AP/Titanic-Kaggle/TItanic-Kaggle.py", line 27, in <module>
accuracy = linReg.score(x_text, y_test)
File "C:\Users\security\Anaconda3\envs\TItanic-Kaggle.py\lib\site-packages\sklearn\base.py", line 330, in score
return r2_score(y, self.predict(X), sample_weight=sample_weight,
File "C:\Users\security\Anaconda3\envs\TItanic-Kaggle.py\lib\site-packages\sklearn\linear_model\base.py", line 213, in predict
return self._decision_function(X)
File "C:\Users\security\Anaconda3\envs\TItanic-Kaggle.py\lib\site-packages\sklearn\linear_model\base.py", line 196, in _decision_function
X = check_array(X, accept_sparse=['csr', 'csc', 'coo'])
File "C:\Users\security\Anaconda3\envs\TItanic-Kaggle.py\lib\site-packages\sklearn\utils\validation.py", line 582, in check_array
context))
ValueError: Found array with 0 sample(s) (shape=(0, 4)) while a minimum of 1 is required.
到目前为止,这是我的代码:
import pandas as pd
from sklearn.linear_model import LinearRegression
train = pd.read_csv("https://raw.githubusercontent.com/oo92/Titanic-Kaggle/master/train.csv")
test = pd.read_csv("https://raw.githubusercontent.com/oo92/Titanic-Kaggle/master/test.csv")
train['Sex'].replace(['female', 'male'], [0, 1])
train['Embarked'].replace(['C', 'Q', 'S'], [1, 2, 3])
linReg = LinearRegression()
# Fill missing values in Age feature with each sex’s median value of Age
train['Age'].fillna(train.groupby('Sex')['Age'].transform("median"), inplace=True)
data = train[['Pclass', 'SibSp', 'Parch', 'Fare', 'Age']]
# Splitting the dataset that contains the missing values and no missing values as test and train respectively.
x_train = data[data['Age'].notnull()].drop(columns='Age')
y_train = data[data['Age'].notnull()]['Age']
x_text = data[data['Age'].isnull()].drop(columns='Age')
y_test = data[data['Age'].isnull()]['Age']
# Training the machine learning algorithm
linReg.fit(x_train, y_train)
# Checking the accuracy score of the model
accuracy = linReg.score(x_text, y_test)
print(accuracy*100, '%')
解决方案
试试这个替换,它会起作用:
x_text = data[data['Age'] != None].drop(columns='Age')
y_test = data[data['Age'] != None]['Age']
这会有所帮助。
推荐阅读
- telegram - Telegram Bot 深度链接:最大有效负载长度是否为 64 个字符?
- plotly - 创建带有行标题的绘图表,输入值是行列表
- c# - GridView 编辑模板中的下拉列表仅返回空值
- flutter - 在哪里可以找到最新的 Flutter 课程?
- android - 使用 ionic 和 cordova 生成面向 API 级别 30 (Android 11) 的 Android 捆绑包
- ruby-on-rails - 设置 cookie 路径和域 (Rails 6)
- android - 闪存驱动器的标准命令示例?
- video - Linphone Xamarin IOS 视频通话崩溃
- markdown - Sphinx Docs - 链接到另一个 Markdown 页面的特定部分
- git - 如何将文件从一个仓库复制到另一个仓库 [Jenkins]