python - TensorFlow 对象检测使用 CPU 进行训练,但它在脚本开头检测并使用 GPU
问题描述
所以我在尝试遵循本教程时遇到了一个问题: https ://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/install.html
虽然我能够使用 CPU 设备运行训练脚本,但我根本无法让它与我的 GPU 一起工作。具体来说,是 model_main_tf2.py(在本教程的“训练自定义对象检测器”部分下)给我带来了问题。我还在脚本中添加了“tf.debugging.set_log_device_placement(True)”行,以希望获得更全面的日志。
奇怪的是,它在执行开始时检测到 GPU,并且(根据我的理解),它使用 GPU 执行某些任务,但是在某些时候,它切换到 CPU 没有任何错误......这部分日志在帖子的后面。
一些系统、硬件和软件规格:
操作系统:Ubuntu 20.4
GPU:GTX 1660 Ti Mobile
nvidia-smi 输出:| NVIDIA-SMI 470.57.02 驱动程序版本:470.57.02 CUDA 版本:11.4 |
tensorflow-gpu 版本:2.5.0
tensorflow 版本:2.5.0
Python 版本:3.9.5
记录脚本切换到 CPU 使用的位置:
2021-07-26 17:00:34.027672: I tensorflow/core/common_runtime/eager/execute.cc:733] Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:0
2021-07-26 17:00:34.028022: I tensorflow/core/common_runtime/eager/execute.cc:733] Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
2021-07-26 17:00:34.028148: I tensorflow/core/common_runtime/eager/execute.cc:733] Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0
2021-07-26 17:00:34.028357: I tensorflow/core/common_runtime/eager/execute.cc:733] Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:0
2021-07-26 17:00:34.028418: I tensorflow/core/common_runtime/eager/execute.cc:733] Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
2021-07-26 17:00:34.028506: I tensorflow/core/common_runtime/eager/execute.cc:733] Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0
2021-07-26 17:00:34.028683: I tensorflow/core/common_runtime/eager/execute.cc:733] Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:0
2021-07-26 17:00:34.028741: I tensorflow/core/common_runtime/eager/execute.cc:733] Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0
2021-07-26 17:00:34.028817: I tensorflow/core/common_runtime/eager/execute.cc:733] Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0
WARNING:tensorflow:From /home/[user]/miniconda3/envs/tensorflowGPU4/lib/python3.9/site-packages/object_detection/model_lib_v2.py:557: StrategyBase.experimental_distribute_datasets_from_function (from tensorflow.python.distribute.distribute_lib) is deprecated and will be removed in a future version.
Instructions for updating:
rename to distribute_datasets_from_function
W0726 17:00:34.090738 140198244909888 deprecation.py:330] From /home/[user]/miniconda3/envs/tensorflowGPU4/lib/python3.9/site-packages/object_detection/model_lib_v2.py:557: StrategyBase.experimental_distribute_datasets_from_function (from tensorflow.python.distribute.distribute_lib) is deprecated and will be removed in a future version.
Instructions for updating:
rename to distribute_datasets_from_function
2021-07-26 17:00:34.091483: I tensorflow/core/common_runtime/eager/execute.cc:733] Executing op HashTableV2 in device /job:localhost/replica:0/task:0/device:CPU:0
2021-07-26 17:00:34.091597: I tensorflow/core/common_runtime/eager/execute.cc:733] Executing op LookupTableImportV2 in device /job:localhost/replica:0/task:0/device:CPU:0
2021-07-26 17:00:34.091910: I tensorflow/core/common_runtime/eager/execute.cc:733] Executing op HashTableV2 in device /job:localhost/replica:0/task:0/device:CPU:0
2021-07-26 17:00:34.091969: I tensorflow/core/common_runtime/eager/execute.cc:733] Executing op LookupTableImportV2 in device /job:localhost/replica:0/task:0/device:CPU:0
2021-07-26 17:00:34.092269: I tensorflow/core/common_runtime/eager/execute.cc:733] Executing op HashTableV2 in device /job:localhost/replica:0/task:0/device:CPU:0
2021-07-26 17:00:34.092327: I tensorflow/core/common_runtime/eager/execute.cc:733] Executing op LookupTableImportV2 in device /job:localhost/replica:0/task:0/device:CPU:0
2021-07-26 17:00:34.092609: I tensorflow/core/common_runtime/eager/execute.cc:733] Executing op HashTableV2 in device /job:localhost/replica:0/task:0/device:CPU:0
2021-07-26 17:00:34.092666: I tensorflow/core/common_runtime/eager/execute.cc:733] Executing op LookupTableImportV2 in device /job:localhost/replica:0/task:0/device:CPU:0
如果有人可以提供帮助,将不胜感激。在这一点上,我几乎已经在互联网上搜寻了任何有同样问题但没有任何运气的人:/
编辑:
只是我的问题的更新。我在装有 GTX 1060 的旧计算机上尝试了完全相同的设置,使用相同的操作系统、图形驱动程序和 CUDA 版本等,但我仍然得到相同的行为。然后我会猜测,这只是配置问题还是某个地方的误解?
解决方案
推荐阅读
- r - 将向量加入r中的数据框
- php - Guzzle 7 - 403 Forbidden(适用于 CURL)
- javascript - vuetify 组合框菜单在模糊时不隐藏
- php - 如何只允许用户只提交一次 html 表单?
- python - 如何为其他 ml 模型转换 tf.data.dataset
- sql - 包装 CASE 语句时,“WEEK”函数不起作用
- python - 如何计算python中线上两点之间的距离
- java - 根据处理器和内存服务器设置 maximumPoolSize 和 minimumIdle
- jquery - D3 和弦图 - 如何选择和弦?
- python - 我不断收到 selenium 和 google 表单错误