tensorflow - 加载时间慢 - EfficientDet D2
问题描述
我正在使用 Jetson AGX Xavier加载 Tensorflow 2 版本的 EfficientDet D2 ( http://download.tensorflow.org/models/object_detection/tf2/20200711/efficientdet_d2_coco17_tpu-32.tar.gz )。
我运行以下脚本:
#!/usr/bin/python3
import tensorflow as tf
import time
from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as viz_utils
PATH_TO_SAVED_MODEL = "./efficientdet_d2_coco17_tpu-32/saved_model/"
print('Loading model...')
start_time = time.time()
# Load saved model and build the detection function
detect_fn = tf.saved_model.load(PATH_TO_SAVED_MODEL)
end_time = time.time()
elapsed_time = end_time - start_time
print('Done! Took {} seconds'.format(elapsed_time))
但是,性能结果是加载时间超过 13 分钟。这是命令执行后的输出:
./test.py
2021-07-04 10:58:58.074413: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2
2021-07-04 10:59:05.375568: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2
Loading model...
2021-07-04 11:00:54.337115: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-07-04 11:00:54.342226: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-07-04 11:00:54.347726: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:908] ARM64 does not support NUMA - returning NUMA node zero
2021-07-04 11:00:54.347959: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:00:00.0 name: Xavier computeCapability: 7.2
coreClock: 1.377GHz coreCount: 8 deviceMemorySize: 31.17GiB deviceMemoryBandwidth: 82.08GiB/s
2021-07-04 11:00:54.348037: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2
2021-07-04 11:00:54.353788: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
2021-07-04 11:00:54.354040: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.10
2021-07-04 11:00:54.358471: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-07-04 11:00:54.359514: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-07-04 11:00:54.364904: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-07-04 11:00:54.369140: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.10
2021-07-04 11:00:54.369861: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-07-04 11:00:54.370262: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:908] ARM64 does not support NUMA - returning NUMA node zero
2021-07-04 11:00:54.370843: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:908] ARM64 does not support NUMA - returning NUMA node zero
2021-07-04 11:00:54.371060: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-07-04 11:00:54.375404: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:908] ARM64 does not support NUMA - returning NUMA node zero
2021-07-04 11:00:54.375623: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:00:00.0 name: Xavier computeCapability: 7.2
coreClock: 1.377GHz coreCount: 8 deviceMemorySize: 31.17GiB deviceMemoryBandwidth: 82.08GiB/s
2021-07-04 11:00:54.375714: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2
2021-07-04 11:00:54.375823: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
2021-07-04 11:00:54.375908: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.10
2021-07-04 11:00:54.376011: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-07-04 11:00:54.376090: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-07-04 11:00:54.376167: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-07-04 11:00:54.376287: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.10
2021-07-04 11:00:54.376369: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-07-04 11:00:54.376673: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:908] ARM64 does not support NUMA - returning NUMA node zero
2021-07-04 11:00:54.376972: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:908] ARM64 does not support NUMA - returning NUMA node zero
2021-07-04 11:00:54.377093: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-07-04 11:05:01.847060: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-04 11:05:01.847174: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0
2021-07-04 11:05:01.847226: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N
2021-07-04 11:05:01.847710: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:908] ARM64 does not support NUMA - returning NUMA node zero
2021-07-04 11:05:01.848589: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:908] ARM64 does not support NUMA - returning NUMA node zero
2021-07-04 11:05:01.848911: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:908] ARM64 does not support NUMA - returning NUMA node zero
2021-07-04 11:05:01.849096: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 19271 MB memory) -> physical GPU (device: 0, name: Xavier, pci bus id: 0000:00:00.0, compute capability: 7.2)
2021-07-04 11:05:01.850298: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
Done! Took 793.8719098567963 seconds
凭借 Xavier 的计算能力,我会期待更好的性能吗?有谁知道这可能是什么原因?
感谢您的任何帮助或意见!
解决方案
所花费的时间不仅仅是加载模型,而是初始化设备。也许问题出在驱动程序上。为了证明这一点,尝试初始化一个更小的模型,或者像 a+b=c 这样的玩具示例。我预计这将需要类似的时间。
此外,计算能力与加载模型无关。模型的加载更多地取决于驱动程序和 TF 的内存管理。内存中模型的实际构建可能在 CPU 上完成,即使使用 GPU 或其他加速器(只是猜测)。
我对 CUDA 和 TF 的体验是使用一个版本的 CUDA、TF 和 GPU 驱动程序初始化时间为 5 分钟。在同一硬件(8x1080ti GPU)上使用另一个版本的 CUDA 和 TF 不到 30 秒。
推荐阅读
- go - 如何传递 [][]int 变量
- python - 当有重复标签时,使用 Python 将 XML 错误报告数据集解析为 CSV
- javascript - 在 foreach 中的按钮单击值
- excel - Excel Javascript API - 分享
- sql-server - 无法在 Windows 10 上启动 SQL Server 代理
- snowflake-cloud-data-platform - 权限不足,无法删除架构
- javascript - 在实现 LinkedList 时理解 Javascript 对象引用
- node.js - Google Home 与 Dynamo DB 的集成
- php - laravel 8 使用 gmail 发送 smtp 电子邮件在当前设置下失败
- c# - 如何对 MongoDB 查询过滤器进行单元测试?