tensorflow - 结合两个数据生成器来训练一个 CNN
问题描述
我正在尝试使用我分成两部分的数据集来训练模型,对于每个部分,我使用 keras 和 tensorflow 创建一个不同的 ImageDataGenerator。
我的问题是,如何结合我的两个生成器的数据来训练模型。我不想单独使用每一个
所有人的 tnx
解决方案
您已将所有数据分成两个不同的目录;现在你想用这两个目录中的数据训练模型。
您可以通过两种方式实现此目的:
Keras ImageDataGenerator
flow_from_directory
方法有一个follow_links
参数。您可以使用follow_links
. 根据您的需要和所需的类结构创建一个单独的目录。在其中,从您的原始数据目录创建符号链接。在下图中,您可以使用该data
目录作为主要输入目录。. ├── Directory1/ │ ├── Class1/ │ └── Class2/ ├── Directory2/ │ ├── Class1/ │ └── Class2/ └── Data/ ├── Class1/ │ ├── symlink_to_Directory1_Class1 │ └── symlink_to_Directory2_Class1 └── Class2/ ├── symlink_to_Directory1_Class2 └── symlink_to_Directory2_Class2
为两个不同的目录制作
ImageDatagenerator
了两个不同的目录。然后将它们合并为一个。在这种情况下,子生成器的批量大小必须与各个目录中的数据数量成比例地确定。子生成器的批量大小:
Where, b = Batch Size Of Any Sub-generator B = Desired Batch Size Of The Merged Generator n = Number Of Images In That Directory Of Sub-generator the sum of n = Total Number Of Images In All Directories
请参阅下面的代码
from keras.preprocessing.image import ImageDataGenerator from keras.utils import Sequence import matplotlib.pyplot as plt import numpy as np import os class MergedGenerators(Sequence): def __init__(self, batch_size, generators=[], sub_batch_size=[]): self.generators = generators self.sub_batch_size = sub_batch_size self.batch_size = batch_size def __len__(self): return int( sum([(len(self.generators[idx]) * self.sub_batch_size[idx]) for idx in range(len(self.sub_batch_size))]) / self.batch_size) def __getitem__(self, index): """Getting items from the generators and packing them""" X_batch = [] Y_batch = [] for generator in self.generators: if generator.class_mode is None: x1 = generator[index % len(generator)] X_batch = [*X_batch, *x1] else: x1, y1 = generator[index % len(generator)] X_batch = [*X_batch, *x1] Y_batch = [*Y_batch, *y1] if self.generators[0].class_mode is None: return np.array(X_batch) return np.array(X_batch), np.array(Y_batch) def build_datagenerator(dir1=None, dir2=None, batch_size=32): n_images_in_dir1 = sum([len(files) for r, d, files in os.walk(dir1)]) n_images_in_dir2 = sum([len(files) for r, d, files in os.walk(dir2)]) # Have to set different batch size for two generators as number of images # in those two directories are not same. As we have to equalize the image # share in the generators generator1_batch_size = int((n_images_in_dir1 * batch_size) / (n_images_in_dir1 + n_images_in_dir2)) generator2_batch_size = batch_size - generator1_batch_size generator1 = ImageDataGenerator( rescale=1. / 255, shear_range=0.2, zoom_range=0.2, rotation_range=5., horizontal_flip=True, ) generator2 = ImageDataGenerator( rescale=1. / 255, zoom_range=0.2, horizontal_flip=False, ) # generator2 has different image augmentation attributes than generaor1 generator1 = generator1.flow_from_directory( dir1, target_size=(128, 128), color_mode='rgb', class_mode=None, batch_size=generator1_batch_size, shuffle=True, seed=42, interpolation="bicubic", ) generator2 = generator2.flow_from_directory( dir2, target_size=(128, 128), color_mode='rgb', class_mode=None, batch_size=generator2_batch_size, shuffle=True, seed=42, interpolation="bicubic", ) return MergedGenerators( batch_size, generators=[generator1, generator2], sub_batch_size=[generator1_batch_size, generator2_batch_size]) def test_datagen(batch_size=32): datagen = build_datagenerator(dir1="./asdf", dir2="./asdf2", batch_size=batch_size) print("Datagenerator length (Batch count):", len(datagen)) for batch_count, image_batch in enumerate(datagen): if batch_count == 1: break print("Images: ", image_batch.shape) plt.figure(figsize=(10, 10)) for i in range(image_batch.shape[0]): plt.subplot(1, batch_size, i + 1) plt.imshow(image_batch[i], interpolation='nearest') plt.axis('off') plt.tight_layout() test_datagen(4)
推荐阅读
- mongodb - mongodb过滤4个嵌套数组并得到兄弟结果
- java - stunnel - 如何在不中断的情况下更换服务器证书
- class - 如何正确地将可变变量传递给带有伴随对象的 kotlin 类?
- docker - 在没有 docker exec 的情况下使用 attach 将命令发送到正在运行的容器
- batch-file - 带有密码的 Windows CMD 行 'sftp' 身份验证
- javascript - Vuejs中如何将css应用到劣等类
- c# - 防止机器人自动启动主对话框
- python - 在python中从二进制转换为文本
- javascript - 空白页:Python 中的 Selenium Chrome 自动化
- php - 上传不同属性的多个文件