首页 > 解决方案 > Optimising GPU use for Keras model training

问题描述

I'm training a Keras model. During the training, I'm only utilising between 5 and 20% of my CUDA cores and an equally small proportion of my NVIDIA RTX 2070 memory. Model training is pretty slow currently and I would really like to take advantage of as many of my available CUDA cores as possible to speed this up!

nvidia dmon # (during model training)

# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    45    49     -     9     6     0     0  6801  1605

What parameters should I look to tune in order to increase CUDA core utilisation with the aim of training the same model faster?

Here's a simplified example of my current image generation and training steps (I can elaborate / edit, if required, but I currently believe these are the key steps for the purpose of the question):

train_datagen = ImageDataGenerator(rescale=1./255)
test_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
    r'./input_training_examples',
    target_size=(150, 150),
    batch_size=32,
    class_mode='binary'
)
validation_generator = test_datagen.flow_from_directory(
    r'./input_validation_examples',
    target_size=(150, 150),
    batch_size=32,
    class_mode='binary'
)

history = model.fit(
    train_generator,
    steps_per_epoch=128, epochs=30,
    validation_data=validation_generator, validation_steps=50,
)

Hardware: NVIDIA 2070 GPU

Platform: Linux 5.4.0-29-generic #33-Ubuntu x86_64, NVIDIA driver 440.64, CUDA 10.2, Tensorflow 2.2.0-rc3

标签: tensorflowkerasnvidia

解决方案


GPU 利用率是一项棘手的工作,涉及的因素太多。

显然要尝试的第一件事:增加批量大小

但这并不能确保最大利用率,也许您的 I/O 很慢,因此 data_generator 存在瓶颈。

NumPy如果您有足够的 ram 内存,您可以尝试将完整数据加载为数组。

您可以尝试在多处理方案中增加工人数量。

model.fit(..., use_multiprocessing=True, workers=8)

最后,取决于您的模型,如果您的模型太轻且不够深,则您的利用率将很低,并且没有进一步改进它的标准方法。


推荐阅读