首页 > 解决方案 > TensorFlow 对象检测使用 CPU 进行训练,但它在脚本开头检测并使用 GPU

问题描述

所以我在尝试遵循本教程时遇到了一个问题: https ://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/install.html

虽然我能够使用 CPU 设备运行训练脚本,但我根本无法让它与我的 GPU 一起工作。具体来说,是 model_main_tf2.py(在本教程的“训练自定义对象检测器”部分下)给我带来了问题。我还在脚本中添加了“tf.debugging.set_log_device_placement(True)”行,以希望获得更全面的日志。

奇怪的是,它在执行开始时检测到 GPU,并且(根据我的理解),它使用 GPU 执行某些任务,但是在某些时候,它切换到 CPU 没有任何错误......这部分日志在帖子的后面。

一些系统、硬件和软件规格:
操作系统:Ubuntu 20.4
GPU:GTX 1660 Ti Mobile
nvidia-smi 输出:| NVIDIA-SMI 470.57.02 驱动程序版本:470.57.02 CUDA 版本:11.4 |
tensorflow-gpu 版本:2.5.0
tensorflow 版本:2.5.0
Python 版本:3.9.5

记录脚本切换到 CPU 使用的位置:

2021-07-26 17:00:34.027672: I tensorflow/core/common_runtime/eager/execute.cc:733] Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:0
2021-07-26 17:00:34.028022: I tensorflow/core/common_runtime/eager/execute.cc:733] Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
2021-07-26 17:00:34.028148: I tensorflow/core/common_runtime/eager/execute.cc:733] Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0
2021-07-26 17:00:34.028357: I tensorflow/core/common_runtime/eager/execute.cc:733] Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:0
2021-07-26 17:00:34.028418: I tensorflow/core/common_runtime/eager/execute.cc:733] Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
2021-07-26 17:00:34.028506: I tensorflow/core/common_runtime/eager/execute.cc:733] Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0
2021-07-26 17:00:34.028683: I tensorflow/core/common_runtime/eager/execute.cc:733] Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:0
2021-07-26 17:00:34.028741: I tensorflow/core/common_runtime/eager/execute.cc:733] Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
2021-07-26 17:00:34.028817: I tensorflow/core/common_runtime/eager/execute.cc:733] Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0
WARNING:tensorflow:From /home/[user]/miniconda3/envs/tensorflowGPU4/lib/python3.9/site-packages/object_detection/model_lib_v2.py:557: StrategyBase.experimental_distribute_datasets_from_function (from tensorflow.python.distribute.distribute_lib) is deprecated and will be removed in a future version.
Instructions for updating:
rename to distribute_datasets_from_function
W0726 17:00:34.090738 140198244909888 deprecation.py:330] From /home/[user]/miniconda3/envs/tensorflowGPU4/lib/python3.9/site-packages/object_detection/model_lib_v2.py:557: StrategyBase.experimental_distribute_datasets_from_function (from tensorflow.python.distribute.distribute_lib) is deprecated and will be removed in a future version.
Instructions for updating:
rename to distribute_datasets_from_function
2021-07-26 17:00:34.091483: I tensorflow/core/common_runtime/eager/execute.cc:733] Executing op HashTableV2 in device /job:localhost/replica:0/task:0/device:CPU:0
2021-07-26 17:00:34.091597: I tensorflow/core/common_runtime/eager/execute.cc:733] Executing op LookupTableImportV2 in device /job:localhost/replica:0/task:0/device:CPU:0
2021-07-26 17:00:34.091910: I tensorflow/core/common_runtime/eager/execute.cc:733] Executing op HashTableV2 in device /job:localhost/replica:0/task:0/device:CPU:0
2021-07-26 17:00:34.091969: I tensorflow/core/common_runtime/eager/execute.cc:733] Executing op LookupTableImportV2 in device /job:localhost/replica:0/task:0/device:CPU:0
2021-07-26 17:00:34.092269: I tensorflow/core/common_runtime/eager/execute.cc:733] Executing op HashTableV2 in device /job:localhost/replica:0/task:0/device:CPU:0
2021-07-26 17:00:34.092327: I tensorflow/core/common_runtime/eager/execute.cc:733] Executing op LookupTableImportV2 in device /job:localhost/replica:0/task:0/device:CPU:0
2021-07-26 17:00:34.092609: I tensorflow/core/common_runtime/eager/execute.cc:733] Executing op HashTableV2 in device /job:localhost/replica:0/task:0/device:CPU:0
2021-07-26 17:00:34.092666: I tensorflow/core/common_runtime/eager/execute.cc:733] Executing op LookupTableImportV2 in device /job:localhost/replica:0/task:0/device:CPU:0

如果有人可以提供帮助,将不胜感激。在这一点上,我几乎已经在互联网上搜寻了任何有同样问题但没有任何运气的人:/

编辑:
只是我的问题的更新。我在装有 GTX 1060 的旧计算机上尝试了完全相同的设置,使用相同的操作系统、图形驱动程序和 CUDA 版本等,但我仍然得到相同的行为。然后我会猜测,这只是配置问题还是某个地方的误解?

标签: pythontensorflowobject-detection-apiresnet

解决方案


推荐阅读