首页 > 解决方案 > 我的训练数据和标签具有不同的 numpy 数组形状。它扰乱了我的训练

问题描述

我有一个基于图像的数据库,我正在使用它并试图将其转换为 numpy 数组。然后我将其用于 cGAN 输入。我尝试过使用多个代码,它们都给了我维度问题。不知道该怎么办

training_data = []
IMG_SIZE = 32
datadir = 'drive/My Drive/dummyDS'  
CATEGORIES = ['HTC-1-M7', 'IPhone-4s', 'iPhone-6', 'LG-Nexus-5x', 
              'Motorola-Droid-Max', 'Motorola-Nexus-6', 'Motorola-X', 
              'Samsung-Galaxy-Note3', 'Samsung-Galaxy-S4', 'Sony-Nex-7']

def create_training_data():
    i=0
    for category in CATEGORIES:
        path=os.path.join(datadir,category)
        class_num = CATEGORIES.index(category)
        for img in os.listdir(path):
          img_array=cv2.imread(os.path.join(path,img))
          new_array=cv2.resize(img_array,(IMG_SIZE,IMG_SIZE))
          training_data.append([new_array,class_num])
          plt.imshow(img_array,cmap="gray")
          plt.imshow(new_array,cmap="gray")
          plt.show() 
create_training_data()
X=[]
y=[]
random.shuffle(training_data)

for features,label in training_data:
    X.append(features)
    y.append(label)

X = np.array(X).reshape(-1, IMG_SIZE, IMG_SIZE, 3)
pickle_out = open("X.pickle","wb")
pickle.dump(X, pickle_out)
pickle_out.close()

y = np.array(y)
pickle_out = open("y.pickle","wb")
pickle.dump(y, pickle_out)
pickle_out.close()


y = to_categorical(y)

# saving the y_labels_one_hot array as a .npy file
np.save('y_labels_one_hot.npy', y)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=2./11)

X_train.shape=(32,32,32,3) 而 y_train.shape= (32,4,2)

现在在训练中我得到

real_labels=to_categorical(Y_train[i*batch_size:(i+1)*batch_size].reshape(-1,1),num_classes=10)
        d_loss_real = discriminator.train_on_batch(x=[X_batch, real_labels],
                                                   y=real * (1 - smooth))
ValueError: All input arrays (x) should have the same number of samples. Got array shapes: [(32, 32, 32, 3), (256, 10)]

标签: pythonarraysnumpytensorflow

解决方案


tensorflow.keras.imagedatagenerator.flow_from_directory应该简化你的任务。

它几乎可以使用您提到的代码以更简单的方式完成您所做的所有事情,包括Splitting数据

提到的代码演示了如何使用它,以及每行代码的详细说明:

train_datagen = ImageDataGenerator(rescale=1./255, # Normalizes every pixel value
    validation_split=0.2) # Setting Validation Data as 20% of Total Data

train_generator = train_datagen.flow_from_directory(
    datadir, # Traverses through all the Sub Folders (Category) inside this dir
    target_size=(img_height, img_width), # Sets the Image Size
    batch_size=batch_size, # Generates batches of `batch_size`
    class_mode='categorical', # Will Consider Labels as Categorical
    shuffle = True, # Shuffles the Data
    subset='training') # Considers 80% as training data

# Since we don't have separate directory for Validation Data and since we want the Total Data to be Partitioned, we should use "train_datagen"
validation_generator = train_datagen.flow_from_directory(
    datadir , # Should use the Same Dir as Training for Splitting
    target_size=(img_height, img_width), 
    batch_size=batch_size,
    class_mode='categorical',
    shuffle = True, # Shuffles the Data
    subset='validation') # Considers 20% as Validation data

# Then you can train the model using the code mentioned below
model.fit(
    train_generator,
    steps_per_epoch = train_generator.samples // batch_size,
    validation_data = validation_generator, 
    validation_steps = validation_generator.samples // batch_size,
    epochs = nb_epochs)

希望这将解决您的不同问题,Shapes因为它将确保Features并且Labels具有相同的形状。如果这种方法导致错误,请分享更多信息。

快乐学习!


推荐阅读