首页 > 解决方案 > Imblearn balanced_batch_generator - can't run my CNN model

问题描述

Currently, I'm working on my first Convolutional Neural Network for a project in university. I have to create a model that can recognize if a cable has a defect by only using images of "good" and "defect" cables in Google Colab.

My dataset is unbalanced; I have more images of 'good' cables then 'defect' ones. That's why I used the imblearn-library and its function 'balanced_batch_generator' so that I could oversample the minority class. To work with the generator, I needed to reshape my X from dim 4 in dim 2. Now when I want to run the balanced batches in my model, I get an error due to the 2x2 shape and I don't know how to reshape the batches inside of the generator to get the model running.

I also tried to put a 'Flatten'-layer as the first layer with typed in input_shape to change the dense layer from the batch_generator but then I can't compile/build the model because it expects more dimensions than I can put in.

Here is my code:

#Model architecture 

model = models.Sequential()
model.add(layers.SeparableConv2D(32,(3,3), activation='relu',
                       input_shape=(150,150,3)
                       )) #Number of filters = 32. Steigert sich mit jedem Con-layer mehr
                          #Standard Practice
model.add(layers.MaxPooling2D((2,2))) #MaxPooling: 2,2 => Reducing size of the images. Immer /2 und runter
                          #Standard Practice
model.add(layers.SeparableConv2D(64,(3,3), activation='relu',
                       input_shape=(150,150,3)
                       ))
model.add(layers.MaxPooling2D((2,2)))

model.add(layers.SeparableConv2D(128,(3,3), activation='relu',
                       input_shape=(150,150,3)
                       ))
model.add(layers.MaxPooling2D((2,2)))

model.add(layers.SeparableConv2D(128,(3,3), activation='relu',
                       input_shape=(150,150,3)
                       ))
model.add(layers.MaxPooling2D((2,2)))

model.add(layers.Flatten())

model.add(layers.Dense(512, activation='relu'))

model.add(layers.Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy',
              optimizer=optimizers.RMSprop(lr=1e-4),
              metrics=['acc'])
              #lr = learning rate by 1.001
)

#Loading the images into the balanced_batch_generator
train_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(
    train_dir,
    target_size =(150,150),
    batch_size = 9,
    class_mode = 'binary'

from imblearn.keras import balanced_batch_generator
from imblearn.over_sampling import RandomOverSampler

for X, y in train_generator:
    break

X = X.reshape(X.shape[0],-1)

training_generator, steps_per_epoch = balanced_batch_generator(X, y, sampler=RandomOverSampler(sampling_strategy='minority'), batch_size=8, keep_sparse = True, random_state=42)

callback_history = model.fit_generator(generator=training_generator, steps_per_epoch = steps_per_epoch,epochs=100, verbose=0)

Error message:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-159-c85c3a625f12> in <module>()
      1 callback_history = model.fit_generator(generator=training_generator, steps_per_epoch = steps_per_epoch,
----> 2                                       epochs=100, verbose=0)

5 frames
/usr/local/lib/python3.6/dist-packages/keras/engine/training_utils.py in standardize_input_data(data, names, shapes, check_batch_axis, exception_prefix)
    133                         ': expected ' + names[i] + ' to have ' +
    134                         str(len(shape)) + ' dimensions, but got array '
--> 135                         'with shape ' + str(data_shape))
    136                 if not check_batch_axis:
    137                     data_shape = data_shape[1:]

ValueError: Error when checking input: expected separable_conv2d_51_input to have 4 dimensions, but got array with shape (8, 67500)

Can anybody help me? Or give me a hint?

标签: pythontensorflowkerasconv-neural-networkimblearn

解决方案


我正在研究一个非常相似的问题,我找到了一种方法来重塑来自训练生成器的样本:

training_generator, steps_per_epoch = balanced_batch_generator(X, y, sampler=RandomOverSampler(sampling_strategy='minority'), batch_size=8, keep_sparse = True, random_state=42)

my_generator = ((np.reshape(X, (-1, og_dim_1, og_dim_2...)), y) for (X,y) in training_generator)

这基本上创建了一个新的生成器,但是将您的训练样本重新调整为原始尺寸。np.reshape 函数的第一个参数需要为 -1,因为您可能会在 epoch 结束时获得可变的批量大小。

然后将这个新生成器传递给您的模型:

callback_history = model.fit_generator(generator=my_generator, steps_per_epoch = steps_per_epoch,epochs=100, verbose=0)

希望这可以帮助!


推荐阅读