tensorflow - 使用多个 GPU 的 TensorFlow 训练速度

问题描述

我目前对训练新张量流模型的速度有疑问。实际上，我假设如果我使用多个 GPU 进行训练，训练速度会显着提高。然而，我发现事实并非如此。在本地和谷歌云中进行了几次测试后，我慢慢不知如何显着提高速度。也许有人暗示我如何加快训练速度。目前，仅以 628 x 628 的图像大小训练了超过 10,000 张图像。

我的本地环境：

absl-py==0.11.0
astor==0.8.1
cycler==0.10.0
gast==0.4.0
grpcio==1.34.0
h5py==2.10.0
imageai==2.1.5
importlib-metadata==2.1.1
Keras==2.2.4
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.2
kiwisolver==1.1.0
Markdown==3.2.2
matplotlib==3.0.3
mock==3.0.5
numpy==1.18.5
opencv-python==4.2.0.32
Pillow==7.2.0
protobuf==3.14.0
pyparsing==2.4.7
python-dateutil==2.8.1
PyYAML==5.3.1
scipy==1.4.1
six==1.15.0
tensorboard==1.12.2
tensorflow-estimator==1.13.0
tensorflow-gpu==1.12.0
termcolor==1.1.0
Werkzeug==1.0.1
zipp==1.2.0

锐龙 5 3600 英伟达 1060 (6 GB) 50 GB 内存

我的环境谷歌云：

一切都在 Docker 容器中运行

absl-py==0.11.0
astor==0.8.1
cycler==0.10.0
gast==0.4.0
grpcio==1.34.0
h5py==2.10.0
imageai==2.1.5
importlib-metadata==2.1.1
Keras==2.2.4
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.2
kiwisolver==1.1.0
Markdown==3.2.2
matplotlib==3.0.3
mock==3.0.5
numpy==1.18.5
opencv-python==4.2.0.32
Pillow==7.2.0
protobuf==3.14.0
pyparsing==2.4.7
python-dateutil==2.8.1
PyYAML==5.3.1
scipy==1.4.1
six==1.15.0
tensorboard==1.12.2
tensorflow-estimator==1.13.0
tensorflow-gpu==1.12.0
termcolor==1.1.0
Werkzeug==1.0.1
zipp==1.2.0

16 个 vCPU 60 GB RAM 4 个 NVIDIA Tesla T4

我每个时代所需时间的测试结果：

1x Nvidia 1060 with a batch size of 4 = 2,97 hours
1x Tesla T4 with a batch size of 12 = 1,19 hours
2x Tesla T4 with a batch size of 12 = 3,37 hours
2x Tesla T4 with a batch size of 24 = 3,37 hours

为什么用两台 Tesla T4 训练比只用一台训练需要更长的时间，为什么训练时间不会随着批量大小的增加而更快？我很感激任何建议。

标签： tensorflowtimegpu

tensorflow - 使用多个 GPU 的 TensorFlow 训练速度

问题描述

解决方案

推荐阅读