python - 错误:“发现样本数量不一致的输入变量:[5114, 3409]”
问题描述
我希望遵循以下步骤:
- 加载数据
- 分为标签和特征集
- 规范化数据
- 划分为测试集和训练集
- 实施过采样(smote)
这是正确的步骤顺序还是我做错了什么?我不断收到一条错误消息,提示“发现样本数量不一致的输入变量:[5114, 3409]”。
在线出现此错误:X_train,Y_train = smote.fit_sample(X_train,Y_train)
#data loading
dataset = pd.read_csv('data.csv')
#view data and check for null values
print(dataset.isnull().values.any())
print(dataset.shape)
# Dividing dataset into label and feature sets
X = dataset.drop('Bankrupt?', axis = 1) # Features
Y = dataset['Bankrupt?'] # Labels
print(type(X))
print(type(Y))
print(X.shape)
print(Y.shape)
# Normalizing numerical features so that each feature has mean 0 and variance 1
feature_scaler = StandardScaler()
X_scaled = feature_scaler.fit_transform(X)
# Dividing dataset into training and test sets
X_train, X_test, Y_train, Y_test = train_test_split( X_scaled, Y, test_size = 0.5, random_state = 100)
print(X_train.shape)
print(X_test.shape)
X = dataset.iloc[:,1:].values
y = dataset.iloc[:,0].values.reshape(-1, 1)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
# Implementing Oversampling to balance the dataset;
print("Number of observations in each class before oversampling (training data): \n", pd.Series(Y_train).value_counts())
smote = SMOTE(random_state = 101)
X_train,Y_train = smote.fit_sample(X_train,Y_train)
print("Number of observations in each class after oversampling (training data): \n", pd.Series(Y_train).value_counts())
解决方案
推荐阅读
- unity3d - Firebase 数据库检索用户的高分排名
- python - 如何在 Python 中更新字典中键字段的值?
- python - 在 Keras Sequential 模型中,Conv2D 似乎要求内核比前一层厚。为什么?
- matlab - 使用优化工具箱时如何解决“数组索引必须是正整数或逻辑值”?
- python-3.x - 从所有列中具有“无”值的数据框中删除行 - Python
- javascript - 路由器链接配置的目标不生成 html 代码
- arrays - 如何快速使用 Ms SQL 查询结果填充数组?
- c# - 当我使用 range.find 使用 vsto 搜索整个文档时,如何跳过表格?
- html - 如何将列表字典转换为html表?
- windows - 完成构建后如何从 appveyor 获取或下载构建的应用程序(exe 文件)