keras - Keras - 创建一个“生成器”来通过数据块训练和拟合图像
问题描述
;)
我有这个问题要创建一个“生成器”来训练大量数据。我有 100K 图像,大约 400Gb 的数据
我的管道类似于:
1.Load A Batch of Images Into Memory (m images)
2.Preprocess Batch of Images
3.Augment Batch of Images
4.Infer and Update Model Using Batch of Images
5.Clear Memory and Start Again
6.Continues Until the Epoch is Complete and Then Usually the Data Is Shuffled (Or It's Done During The Epoch s Number of Examples Ahead)
包含一个变量“image_path”的数据帧,其中包含文件夹中图像的路径,目标是 OHE 变量 O, 1, 2, 3 ex :
image_path 0 1 2 3
../images/img1.png 0 0 1 0
../images/img1.png 1 0 1 0
我已经构建了 3 个功能:
一种预处理图像的功能
def preprocess_hflip(size, train_img_names, test=False):
x_train = []
for i in train_img_names:
x_train.append(open_multilayer_image_aug_hflip(i, size, test))
x_train = np.array(x_train)
#print("x_train_shape : ",x_train.shape)
x_train = x_train.astype('float32')
x_train /= 255
return x_train
加载图像和调整大小的第二个函数
def open_multilayer_image_aug_hflip(path, SIZE, test=False):
#fullpath = root_train_directory+path
#print(path)
fullpath = path
if test:
img = plt.imread(fullpath)
img = cv2.resize(img, SIZE)
ni = np.zeros((SIZE[0],SIZE[1],3), 'uint8')
ni[..., 0] = img*255
#print(img)
else :
if fullpath[-3:] == '.png':
img = plt.imread(fullpath) ## reading from "image_path" on dataframe
img = cv2.resize(red, SIZE)
ni = np.zeros((SIZE[0],SIZE[1],3), 'uint8')
ni[..., 0] = img*255
return ni
Usage : X_train = preprocess_hflip((128,128), train['image_path'][:100].values)
第三个函数“生成器”
def generate_arrays_from_file(df, batchsize):
#batchsize = 2
inputs = []
targets = []
batchcount = 0
totbatch = 0
#while True:
for k,v in df.iterrows():
#print(k,v)
#print(v['image_path'], v[3:])
inputs.append(v['image_path'])
targets.append(v[3:])
batchcount += 1
#print("BatchCount ", batchcount)
if batchcount >= batchsize:
totbatch = batchcount + totbatch
#print("BatchCount > batchSize", batchcount, batchsize)
print("Total files traités :", totbatch)
X = np.array(inputs)
y = np.array(targets, dtype='float32')
#print(y)
yield (X, y)
inputs = []
targets = []
batchcount = 0
我做了很多测试,但我无法得到预期的结果,我从管道中得到的最接近的结果
1.Load A Batch of Images Into Memory (m images) - OK
2.Preprocess Batch of Images - OK
3.Augment Batch of Images - Ok
4.Infer and Update Model Using Batch of Images - OK
5.Clear Memory and Start Again - OK
6.Continues Until the Epoch is Complete and Then Usually the Data Is Shuffled (Or It's Done During The Epoch s Number of Examples Ahead) - Fail
我只能通过第一次尝试训练第一个 epoch,但是当第一个 epoch 终止时,火车也结束了。我需要循环这个训练,使用“批量大小”的块,直到完成所有 epoch 的数量,在本例中为 10。
from keras.preprocessing.image import ImageDataGenerator
monGenerateur = generate_arrays_from_file(train[:40], 8)
for X,y in monGenerateur:
#print(X, y.shape, type(y))
datagen = ImageDataGenerator()
train_generator = datagen.flow(preprocess_hflip( (128,128), X), # X Comes from Generator
y,
batch_size=64,
shuffle=True,
seed=42)
valid_generator = datagen.flow(preprocess_hflip(1, (128,128), test['image_path'].values, 'test'),
test.iloc[:,3:22].values)
#for i in range(10):
hist_1 = model.fit(train_generator,
batch_size = 64,
epochs=10,
steps_per_epoch = 10,
validation_data = next(valid_generator),
validation_steps=10,
shuffle=True,
callbacks = [rlr,ckp,es],
verbose = 2
)
下面是训练过程的截图
Total files traités : 8
Epoch 1/10
10/10 - 2s - loss: 0.2214 - BinaryCrossentropy: 0.2201 - precision: 0.0000e+00 - recall: 0.0000e+00 - binary_accuracy: 0.9474 - AUC: 0.5677 - val_loss: 0.2184 - val_BinaryCrossentropy: 0.2173 - val_precision: 0.0000e+00 - val_recall: 0.0000e+00 - val_binary_accuracy: 0.9474 - val_AUC: 0.6311
Total files traités : 16
Epoch 1/10
10/10 - 2s - loss: 0.2113 - BinaryCrossentropy: 0.2103 - precision: 0.0000e+00 - recall: 0.0000e+00 - binary_accuracy: 0.9474 - AUC: 0.7266 - val_loss: 0.2094 - val_BinaryCrossentropy: 0.2080 - val_precision: 0.0000e+00 - val_recall: 0.0000e+00 - val_binary_accuracy: 0.9474 - val_AUC: 0.6311
Total files traités : 24
问题很明显,它只是对“monGenerateu”的输出进行了 8 次 5 次迭代。我还进行了另一个测试,在生成器中调用了预处理
datagen = ImageDataGenerator()
def generate_arrays_from_file(df, batchsize):
#batchsize = 2
inputs = []
targets = []
batchcount = 0
totbatch = 0
#while True:
for k,v in df.iterrows():
#print(k,v)
#print(v['image_path'], v[3:])
inputs.append(v['image_path'])
targets.append(v[3:])
batchcount += 1
print("BatchCount ", batchcount)
if batchcount >= batchsize:
totbatch = batchcount + totbatch
print("BatchCount > batchSize", batchcount, batchsize)
print("Total files traités :", totbatch)
X = preprocess_hflip((128,128), inputs)
#X = np.array(inputs)
y = np.array(targets, dtype='float32')
#y = np.asarray(targets)
print(X.shape, y.shape)
yield (X, y)
inputs = []
targets = []
batchcount = 0
但结果是,它遍历图像,当达到总批量大小时开始训练,这不是我所期望的,我希望对每个数据块进行训练并继续训练直到完成所有时期。
任何帮助表示赞赏
解决方案
推荐阅读
- hyperledger-fabric - 如何从 docker 容器上运行的客户端应用程序连接到 Fabric 网络
- sql - 仅检索每个学生的第一行
- ansible - Ansible:在另一个循环中引用的循环变量
- javascript - 箭头函数未定义
- java - 如何忽略 Xpath 中的特定字符?
- angular - Angular Overlay 将不存在的 dom 元素称为覆盖容器
- javascript - Chainlink 外部适配器对 Spotify 进行 API 调用时出现问题
- database - 安装 mongoose 和 Mongodb 时出错
- database - Oracle DB - 连接问题
- go - $ fyne package -os linux ... 结果是:bash: fyne: command not found