python-3.x - ValueError:发现样本数量不一致的输入变量:[1, 700]
问题描述
我正在对 kaggle 提供的关于泰坦尼克号幸存者预测的数据执行线性回归。我试图预测幸存者列表,所以即使在我重塑 Y 之后,我仍然会收到这个错误,它仍然显示这个错误。
from sklearn.linear_model import LogisticRegression
from csv import reader
import numpy as np
file = open('train.csv', "r")
lines = reader(file)
X = list(lines)
#Deleting unnecessary features
X=np.delete(X, (0), axis=0)
X=np.delete(X, (0), axis=1)
X=np.delete(X, (2), axis=1)
X=np.delete(X, (3), axis=1)
X=np.delete(X, (5), axis=1)
X=np.delete(X, (5), axis=1)
X=np.delete(X, (5), axis=1)
X=np.delete(X, (5), axis=1)
#Converting males to 1 and females to 0
for i in range(891):
if X[i][2]== 'male':
X[i][2]=1
else:
X[i][2]=0
Y=X.T[0]
#Converting strings to float
X1 = X.astype(np.float)
Y1 = Y.astype(np.float)
Xw=X1.reshape(-1,1)
split = 700
train,test = Xw[:split,:],Xw[split:,:]
Ytrain,Ytest = Y1[:split],Y1[:split]
logisticRegr = LogisticRegression()
logisticRegr.fit(train.T, Ytrain)
logisticRegr.predict(test[0].T.reshape(1,-1))
score = logisticRegr.score(test.T, Ytest)
解决方案
我强烈建议您熟悉pandas
用于数据处理的库,您可以尝试以下方法:
# import
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn import metrics
import pandas as pd
df = pd.read_csv('train.csv')
# convert to male/female, lets say the column is called as gender
df['gender'] = df['gender'].map({'male': 0, 'female': 1})
trainX, testX, trainY, testY = train_test_split(df, df['Survived'], train_size=700, stratify = df['Survived'],)
logisticRegr = LogisticRegression()
logisticRegr.fit(trainX, trainY)
preds = logisticRegr.predict(testX)
score = metrics.accuracy_score(testY, preds)
推荐阅读
- typescript-typings - 如何更新旧版本的类型
- python - Tensorflow 和 Keras 实现之间的比较(第 1 部分:模型)
- python - 这个警告在 PATE 分析中意味着什么?
- haskell - 我可以像文字似乎能够“返回” Num a 一样“返回” Num a 吗?
- variables - Shopify 变量以在感谢页面中获取客户电子邮件
- c# - 带有 C# 和位桶的自动 git 克隆脚本
- echo - Echo show 5 使用内置摄像头录制视频
- jupyter-notebook - 使用 Papermill 在 Notebook 中执行 get_ipython 代码时出现问题
- apache - 如何在 Apache 2.4 上输出 LimitRequestBody 的当前值
- angular - 尽管声明,“组件选择器”不是已知元素?