首页 > 解决方案 > Keras - 创建一个“生成器”来通过数据块训练和拟合图像

问题描述

;)

我有这个问题要创建一个“生成器”来训练大量数据。我有 100K 图像,大约 400Gb 的数据

我的管道类似于:

1.Load A Batch of Images Into Memory (m images)
2.Preprocess Batch of Images
3.Augment Batch of Images
4.Infer and Update Model Using Batch of Images
5.Clear Memory and Start Again
6.Continues Until the Epoch is Complete and Then Usually the Data Is Shuffled (Or It's Done During The Epoch s Number of Examples Ahead)

包含一个变量“image_path”的数据帧,其中包含文件夹中图像的路径,目标是 OHE 变量 O, 1, 2, 3 ex :

image_path           0 1 2 3
../images/img1.png   0 0 1 0
../images/img1.png   1 0 1 0

我已经构建了 3 个功能:

一种预处理图像的功能

def preprocess_hflip(size, train_img_names, test=False):
    x_train = []
    for i in train_img_names:
        x_train.append(open_multilayer_image_aug_hflip(i, size, test))
        
    x_train = np.array(x_train)
    #print("x_train_shape : ",x_train.shape)
    x_train = x_train.astype('float32')
    x_train /= 255
    return x_train

加载图像和调整大小的第二个函数

def open_multilayer_image_aug_hflip(path, SIZE, test=False):
    #fullpath = root_train_directory+path
    #print(path)
    fullpath = path
    if test:
        img = plt.imread(fullpath)
        img = cv2.resize(img, SIZE)
        ni = np.zeros((SIZE[0],SIZE[1],3), 'uint8')
        ni[..., 0] = img*255
        #print(img)
    else : 
        if fullpath[-3:] == '.png':
            img = plt.imread(fullpath) ## reading from "image_path" on dataframe
            img = cv2.resize(red, SIZE)
            ni = np.zeros((SIZE[0],SIZE[1],3), 'uint8')
            ni[..., 0] = img*255
    return ni

Usage : X_train = preprocess_hflip((128,128), train['image_path'][:100].values)

第三个函数“生成器”

def  generate_arrays_from_file(df, batchsize):
    #batchsize = 2
    inputs = []
    targets = []
    batchcount = 0
    totbatch = 0
    #while True:
    for k,v in df.iterrows():
        #print(k,v)
        #print(v['image_path'], v[3:])
        inputs.append(v['image_path'])
        targets.append(v[3:])
        batchcount += 1
        #print("BatchCount ", batchcount)
        if batchcount >= batchsize:
            totbatch = batchcount +  totbatch
            #print("BatchCount > batchSize", batchcount, batchsize)
            print("Total files traités :", totbatch)
            X = np.array(inputs)
            y = np.array(targets, dtype='float32')
            #print(y)
            yield (X, y)
            inputs = []
            targets = []
            batchcount = 0

我做了很多测试,但我无法得到预期的结果,我从管道中得到的最接近的结果

1.Load A Batch of Images Into Memory (m images) - OK
2.Preprocess Batch of Images - OK
3.Augment Batch of Images - Ok
4.Infer and Update Model Using Batch of Images - OK
5.Clear Memory and Start Again - OK
6.Continues Until the Epoch is Complete and Then Usually the Data Is Shuffled (Or It's Done During The Epoch s Number of Examples Ahead) - Fail 

我只能通过第一次尝试训练第一个 epoch,但是当第一个 epoch 终止时,火车也结束了。我需要循环这个训练,使用“批量大小”的块,直到完成所有 epoch 的数量,在本例中为 10。

from keras.preprocessing.image import ImageDataGenerator
monGenerateur = generate_arrays_from_file(train[:40], 8)
for X,y in monGenerateur:
    #print(X, y.shape, type(y))
    datagen = ImageDataGenerator()
    train_generator = datagen.flow(preprocess_hflip( (128,128), X), # X Comes from Generator
                               y,
                               batch_size=64,
                               shuffle=True,
                               seed=42)

    valid_generator = datagen.flow(preprocess_hflip(1, (128,128), test['image_path'].values, 'test'),
                               test.iloc[:,3:22].values)


    #for i in range(10):
    hist_1 = model.fit(train_generator, 
          batch_size = 64, 
          epochs=10,
          steps_per_epoch = 10,
          validation_data = next(valid_generator),
          validation_steps=10,
          shuffle=True,
          callbacks = [rlr,ckp,es],
          verbose = 2
             )

下面是训练过程的截图

Total files traités : 8
Epoch 1/10
10/10 - 2s - loss: 0.2214 - BinaryCrossentropy: 0.2201 - precision: 0.0000e+00 - recall: 0.0000e+00 - binary_accuracy: 0.9474 - AUC: 0.5677 - val_loss: 0.2184 - val_BinaryCrossentropy: 0.2173 - val_precision: 0.0000e+00 - val_recall: 0.0000e+00 - val_binary_accuracy: 0.9474 - val_AUC: 0.6311
Total files traités : 16
Epoch 1/10
10/10 - 2s - loss: 0.2113 - BinaryCrossentropy: 0.2103 - precision: 0.0000e+00 - recall: 0.0000e+00 - binary_accuracy: 0.9474 - AUC: 0.7266 - val_loss: 0.2094 - val_BinaryCrossentropy: 0.2080 - val_precision: 0.0000e+00 - val_recall: 0.0000e+00 - val_binary_accuracy: 0.9474 - val_AUC: 0.6311
Total files traités : 24

问题很明显,它只是对“monGenerateu”的输出进行了 8 次 5 次迭代。我还进行了另一个测试,在生成器中调用了预处理

datagen = ImageDataGenerator()
def  generate_arrays_from_file(df, batchsize):
    #batchsize = 2
    inputs = []
    targets = []
    batchcount = 0
    totbatch = 0
    #while True:
    for k,v in df.iterrows():
        #print(k,v)
        #print(v['image_path'], v[3:])
        inputs.append(v['image_path'])
        targets.append(v[3:])
        batchcount += 1
        print("BatchCount ", batchcount)
        if batchcount >= batchsize:
            totbatch = batchcount + totbatch
            print("BatchCount > batchSize", batchcount, batchsize)
            print("Total files traités :", totbatch)
            X = preprocess_hflip((128,128), inputs)
            #X = np.array(inputs)
            y = np.array(targets, dtype='float32')
            #y = np.asarray(targets)
            print(X.shape, y.shape)
            yield (X, y)
            inputs = []
            targets = []
            batchcount = 0

但结果是,它遍历图像,当达到总批量大小时开始训练,这不是我所期望的,我希望对每个数据块进行训练并继续训练直到完成所有时期。

任何帮助表示赞赏

标签: kerasgeneratorimage-segmentation

解决方案


推荐阅读