首页 > 解决方案 > 训练和验证的不同 Keras 增强

问题描述

我正在为图像分类目的运行增强 - 使用 Keras - 作为:

# Define Parameters
parameters = {"img_width" : 224,
              "img_height": 224,
              "epochs": 50,
              "batch_size" : 15}

# Define Generators  
train_datagen = ImageDataGenerator(
    rescale = 1. / 255,
    shear_range = 0.2,
    zoom_range = 0.2,
    horizontal_flip = True,
    validation_split = 0.06)

test_datagen = ImageDataGenerator(
    rescale=1/255)

# Define Flows from directories
train_generator = train_datagen.flow_from_directory(
    directory = train_data_dir,
    target_size=(parameters["img_width"], parameters["img_height"]),
    batch_size = parameters["batch_size"],
    class_mode= "categorical", 
    subset = "training", 
    color_mode = "rgb",
    seed = 42)

validation_generator = train_datagen.flow_from_directory(
    directory = train_data_dir,
    target_size = (parameters["img_width"], parameters["img_height"]),
    batch_size = parameters["batch_size"],
    class_mode='categorical',
    subset = "validation",
    color_mode = "rgb",
    seed = 42)

testing_generator = test_datagen.flow_from_dataframe(
        dataframe = testing_df, 
        x_col="path", y_col="label", 
        class_mode="raw", 
        target_size= (parameters["img_width"], parameters["img_height"]), 
        shuffle = False,
        batch_size= parameters["batch_size"])

这段代码作为训练、验证和测试的输出:找到属于 69 个类的 4911 张图像。找到属于 69 个类别的 282 张图像。找到 421 个经过验证的图像文件名。

但是,如果我想使用 test_datagen 而不是 train_datagen 来验证数据:

validation_generator = test_datagen.flow_from_directory(
    # Changing Line        
    directory = train_data_dir,
    target_size = (parameters["img_width"], parameters["img_height"]),
    batch_size = parameters["batch_size"],
    class_mode='categorical',
    subset = "validation",
    color_mode = "rgb",
    seed = 42)

我得到输出:找到属于 69 个类的 0 个图像。

我该如何解决这个问题?简而言之,我想验证模型将有效运行的图像上的数据,因此使用仅缩放值的 test_datagen。

ps train_data_dir 是一个文件夹,其中包含 69 个文件夹,其中包含来自不同类别的图像;

标签: pythonkeras

解决方案


我认为您不应该为验证和培训引用同一个目录。

尝试指向特定的验证目录,例如:

validation_generator = test_datagen.flow_from_directory(
    # Changing Line        
    directory = validation_data_dir,
    target_size = (parameters["img_width"], parameters["img_height"]),
    batch_size = parameters["batch_size"],
    class_mode='categorical',
    subset = "validation",
    color_mode = "rgb",
    seed = 42)

目录应该是这样的:

train/
    69 folders
validation/
    69 folders
test/ 
    69 folders

例如,我通常使用的设置是:

train_data_dir = (str(cwd) + r'\augmented\train\\')
validation_data_dir = (str(cwd) + r'\augmented\validation\\')

train_datagen = ImageDataGenerator(
    rescale=1. / 255,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True)

test_datagen = ImageDataGenerator(rescale=1. / 255)

train_generator = train_datagen.flow_from_directory(
    train_data_dir,
    target_size=(img_width, img_height),
    batch_size=batch_size,
    class_mode='binary')

validation_generator = test_datagen.flow_from_directory(
    validation_data_dir,
    target_size=(img_width, img_height),
    batch_size=batch_size,
    class_mode='binary')

history = model.fit_generator(
    train_generator,
    steps_per_epoch=nb_train_samples // batch_size,
    epochs=epochs,
    validation_data=validation_generator,
    validation_steps=nb_validation_samples // batch_size)

要将图像添加到单独的目录中,您可以执行以下操作,请注意这会有点乏味,我建议从您的类列表中创建一个循环。在我的示例中,我只进行了二进制分类,1 或 0。我拍摄了“原始”0 图像并在训练、验证和测试文件夹中进行了扩充,然后再次为 1 图像运行脚本。你有更多的类,因此建议循环一个列表。

# rescaling is disabled to allow the images to be viewed
datagen = ImageDataGenerator(
        rotation_range=40,
        width_shift_range=0.2,
        height_shift_range=0.2,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True,
        fill_mode='nearest')

# this is a PIL image # path + filename
img = load_img(r'path_to_single_image_to_be_augmented')
# this is a Numpy array with shape (3, 150, 150)
x = img_to_array(img)
# this is a Numpy array with shape (1, 3, 150, 150)
x = x.reshape((1,) + x.shape)

# the .flow() command below generates batches of randomly transformed
# images and saves the results to save_to_dir - remember to change prefix
i = 0
for batch in datagen.flow(x, batch_size=1,
                          save_to_dir=(str(cwd) + r'\augmented\test\0'),
                          save_prefix='0', save_format='jpeg'):
    i += 1
    if i > 110:  # change the amount of augmented data you want here
        break  # otherwise the generator would loop indefinitely

i = 0
for batch in datagen.flow(x, batch_size=1,
                          save_to_dir=(str(cwd) + r'\augmented\test\0'),
                          save_prefix='0', save_format='jpeg'):
    i += 1
    if i > 280:  
        break  

i = 0
for batch in datagen.flow(x, batch_size=1,
                          save_to_dir=(str(cwd) + r'\augmented\validation\0'),
                          save_prefix='0', save_format='jpeg'):
    i += 1
    if i > 280: 
        break  

推荐阅读