首页 > 解决方案 > 如何编写迭代整个数据集的批处理生成器

问题描述

我想为训练目的创建一个批处理生成器 - 不幸的是,我不知道如何制作它,以便生成器在完成批处理后继续执行下一个时间步骤。也就是说,如果批处理已经处理了say [0 1 2 3 4],那么它必须处理整个say [0 1 2 ... 100] 训练集的下一个[5 6 7 8 9]。

我还希望一个时期是通过整个批次的一次-因此批次生成器必须在训练集的开头返回。

从 keras.io,我读到如果默认 epoch_steps=None,则“等于数据集中的样本数除以批量大小,如果无法确定,则为 1”。

def batch_generator(batch_size, sequence_length):
    """
    Generator function for creating batches of training-data.
    """

    # Infinite loop.""
    while True:
        # Allocate a new array for the batch of input-signals.
        x_shape = (batch_size, sequence_length, num_x_signals)
        x_batch = np.zeros(shape=x_shape, dtype=np.float16)

        # Allocate a new array for the batch of output-signals.
        y_shape = (batch_size, sequence_length, num_y_signals)
        y_batch = np.zeros(shape=y_shape, dtype=np.float16)

        # Fill the batch with random sequences of data.
        for i in range(batch_size):

            # Copy the sequences of data starting at this index.
            x_batch[i] = x_train_scaled[:sequence_length]
            y_batch[i] = y_train_scaled[:sequence_length]

        x_batch_1 = x_batch[ :, :, 0:5]
        x_batch_2 = x_batch[ :, :, 5:12]
        yield ([x_batch_1, x_batch_2], y_batch)

batch_size = 32
sequence_length = 24 * 7 

generator = batch_generator(batch_size=batch_size,
                            sequence_length=sequence_length)
%%time
model.fit_generator(generator=generator,
                    epochs=10,
                    steps_per_epoch=None,
                    validation_data=validation_data,
                    callbacks=callbacks)

标签: pythonmachine-learningkerasdata-generation

解决方案


如果您可以使用 tensorflow,批处理重复执行您所描述的操作。

否则,Keras - fit_generator() 中如何使用批次和时期?看起来相关。


推荐阅读