python - 我的训练数据和标签具有不同的 numpy 数组形状。它扰乱了我的训练
问题描述
我有一个基于图像的数据库,我正在使用它并试图将其转换为 numpy 数组。然后我将其用于 cGAN 输入。我尝试过使用多个代码,它们都给了我维度问题。不知道该怎么办
training_data = []
IMG_SIZE = 32
datadir = 'drive/My Drive/dummyDS'
CATEGORIES = ['HTC-1-M7', 'IPhone-4s', 'iPhone-6', 'LG-Nexus-5x',
'Motorola-Droid-Max', 'Motorola-Nexus-6', 'Motorola-X',
'Samsung-Galaxy-Note3', 'Samsung-Galaxy-S4', 'Sony-Nex-7']
def create_training_data():
i=0
for category in CATEGORIES:
path=os.path.join(datadir,category)
class_num = CATEGORIES.index(category)
for img in os.listdir(path):
img_array=cv2.imread(os.path.join(path,img))
new_array=cv2.resize(img_array,(IMG_SIZE,IMG_SIZE))
training_data.append([new_array,class_num])
plt.imshow(img_array,cmap="gray")
plt.imshow(new_array,cmap="gray")
plt.show()
create_training_data()
X=[]
y=[]
random.shuffle(training_data)
for features,label in training_data:
X.append(features)
y.append(label)
X = np.array(X).reshape(-1, IMG_SIZE, IMG_SIZE, 3)
pickle_out = open("X.pickle","wb")
pickle.dump(X, pickle_out)
pickle_out.close()
y = np.array(y)
pickle_out = open("y.pickle","wb")
pickle.dump(y, pickle_out)
pickle_out.close()
y = to_categorical(y)
# saving the y_labels_one_hot array as a .npy file
np.save('y_labels_one_hot.npy', y)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=2./11)
X_train.shape=(32,32,32,3) 而 y_train.shape= (32,4,2)
现在在训练中我得到
real_labels=to_categorical(Y_train[i*batch_size:(i+1)*batch_size].reshape(-1,1),num_classes=10)
d_loss_real = discriminator.train_on_batch(x=[X_batch, real_labels],
y=real * (1 - smooth))
ValueError: All input arrays (x) should have the same number of samples. Got array shapes: [(32, 32, 32, 3), (256, 10)]
解决方案
tensorflow.keras.imagedatagenerator.flow_from_directory
应该简化你的任务。
它几乎可以使用您提到的代码以更简单的方式完成您所做的所有事情,包括Splitting
数据
提到的代码演示了如何使用它,以及每行代码的详细说明:
train_datagen = ImageDataGenerator(rescale=1./255, # Normalizes every pixel value
validation_split=0.2) # Setting Validation Data as 20% of Total Data
train_generator = train_datagen.flow_from_directory(
datadir, # Traverses through all the Sub Folders (Category) inside this dir
target_size=(img_height, img_width), # Sets the Image Size
batch_size=batch_size, # Generates batches of `batch_size`
class_mode='categorical', # Will Consider Labels as Categorical
shuffle = True, # Shuffles the Data
subset='training') # Considers 80% as training data
# Since we don't have separate directory for Validation Data and since we want the Total Data to be Partitioned, we should use "train_datagen"
validation_generator = train_datagen.flow_from_directory(
datadir , # Should use the Same Dir as Training for Splitting
target_size=(img_height, img_width),
batch_size=batch_size,
class_mode='categorical',
shuffle = True, # Shuffles the Data
subset='validation') # Considers 20% as Validation data
# Then you can train the model using the code mentioned below
model.fit(
train_generator,
steps_per_epoch = train_generator.samples // batch_size,
validation_data = validation_generator,
validation_steps = validation_generator.samples // batch_size,
epochs = nb_epochs)
希望这将解决您的不同问题,Shapes
因为它将确保Features
并且Labels
具有相同的形状。如果这种方法导致错误,请分享更多信息。
快乐学习!
推荐阅读
- sql - SQL:选择至少包含两项和一项的行
- python - 无法锁定 Excel 工作表中的列 - xlsxwriter
- java - 到达控制器时子类型数据丢失
- ms-access - 如何根据记录的值更改子表单上的字段
- gradle - 在 Grails 3 build.gradle 文件中为 bootRun 设置 JVM 参数中的最大堆大小会破坏 IntelliJ 中的调试功能
- python - 输入字段未显示在模板上
- python - /result 处的 OperationalError 无法打开数据库文件
- scala - 使用 ssl 连接的 Spark 到 kafka 连接
- python - 在网页上定位特定字词并在找到时发送通知
- mysql - 带有遗留数据库和外键的 Django,其中 id=0