首页 > 解决方案 > TensorFlow 自动混合精度 fp16 比官方 resnet 上的 fp32 慢

问题描述

我正在尝试使用来自https://github.com/tensorflow/models/blob/master/official/resnet/estimator_benchmark.py#L191的官方 ResNet 模型基准来试验tensorflow-gpu==1.14.0rc0. 我在 2080 Ti、驱动程序 410.78、CUDA 10、Ubuntu 上运行。

我进行了以下更改,以帮助确保比较快速且一目了然:

我在日志中看到了这一点,这对我来说意味着 AMP 处于活动状态:

2019-06-03 16:08:40.976829: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1767] Running auto_mixed_precision graph optimizer
2019-06-03 16:08:40.977057: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1241] No whitelist ops found, nothing to do
2019-06-03 16:08:40.985402: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1241] No whitelist ops found, nothing to do
2019-06-03 16:08:40.986858: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1241] No whitelist ops found, nothing to do
2019-06-03 16:08:40.987745: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1241] No whitelist ops found, nothing to do
2019-06-03 16:08:40.996781: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1241] No whitelist ops found, nothing to do
2019-06-03 16:08:41.001948: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1241] No whitelist ops found, nothing to do
2019-06-03 16:08:41.003208: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1241] No whitelist ops found, nothing to do
2019-06-03 16:08:41.004589: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1241] No whitelist ops found, nothing to do
2019-06-03 16:08:41.005981: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1241] No whitelist ops found, nothing to do
2019-06-03 16:08:41.511761: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1767] Running auto_mixed_precision graph optimizer
2019-06-03 16:08:41.527751: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1723] Converted 529/2910 nodes to float16 precision using 3 cast(s) to float16 (excluding Const and Variable casts)

但实际运行时间较慢:

fp32(青色)运行时间小于所有 fp16 运行时间。

我该怎么做才能看到性能改进?

标签: tensorflowhalf-precision-float

解决方案


推荐阅读