首页 > 解决方案 > 未找到:ID 为 0 的 TF GPU 设备未注册,之后出现段错误

问题描述

你能帮我调试一下这个问题吗?我尝试使用多个版本进行构建,但无法解决。

我的配置:

硬件: MacBook Pro 13,3 eGPU NVIDIA 1080

软件:
macOS 10.13.6
NVIDIA Web 驱动程序 387.10.10.10.40.105
CUDA 驱动程序 396.148
CUDA 9.1 工具包
cuDNN 7.0.5
Python 2.7
NCCL 2.1.15
Xcode 9.2

>>> import tensorflow as tf
>>> tf.test.is_gpu_available()
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:859] OS X does not support NUMA - returning NUMA node zero
tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties: 
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7335
pciBusID: 0000:46:00.0
totalMemory: 8.00GiB freeMemory: 2.32GiB
tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0 
tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N 
tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/device:GPU:0 with 2025 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:46:00.0, compute capability: 6.1)
True

当我尝试运行某些东西时,最后会收到带有段错误的错误消息:

Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes.
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:859] OS X does not support NUMA - returning NUMA node zero
tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties: 
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7335
pciBusID: 0000:46:00.0
totalMemory: 8.00GiB freeMemory: 3.39GiB
tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0 
tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N 
tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3118 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:46:00.0, compute capability: 6.1)
E tensorflow/core/grappler/clusters/utils.cc:127] Not found: TF GPU device with id 0 was not registered
Segmentation fault: 11

在其他程序中,我尝试减少 per_process_gpu_memory_fraction 和批量大小,因此它在第一批后崩溃并出现相同的错误代码。

标签: pythontensorflow

解决方案


推荐阅读