python - 未找到:ID 为 0 的 TF GPU 设备未注册,之后出现段错误
问题描述
你能帮我调试一下这个问题吗?我尝试使用多个版本进行构建,但无法解决。
我的配置:
硬件: MacBook Pro 13,3 eGPU NVIDIA 1080
软件:
macOS 10.13.6
NVIDIA Web 驱动程序 387.10.10.10.40.105
CUDA 驱动程序 396.148
CUDA 9.1 工具包
cuDNN 7.0.5
Python 2.7
NCCL 2.1.15
Xcode 9.2
>>> import tensorflow as tf
>>> tf.test.is_gpu_available()
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:859] OS X does not support NUMA - returning NUMA node zero
tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7335
pciBusID: 0000:46:00.0
totalMemory: 8.00GiB freeMemory: 2.32GiB
tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0
tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N
tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/device:GPU:0 with 2025 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:46:00.0, compute capability: 6.1)
True
当我尝试运行某些东西时,最后会收到带有段错误的错误消息:
Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes.
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:859] OS X does not support NUMA - returning NUMA node zero
tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7335
pciBusID: 0000:46:00.0
totalMemory: 8.00GiB freeMemory: 3.39GiB
tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0
tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N
tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3118 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:46:00.0, compute capability: 6.1)
E tensorflow/core/grappler/clusters/utils.cc:127] Not found: TF GPU device with id 0 was not registered
Segmentation fault: 11
在其他程序中,我尝试减少 per_process_gpu_memory_fraction 和批量大小,因此它在第一批后崩溃并出现相同的错误代码。
解决方案
推荐阅读
- r - 用 dplyr() 索引 strsplit()
- python - 如何计算python数据框中的计数规则?(类似于 SQL 中的 count if)
- css - z-index 值“未设置”有什么作用?
- html - 如何填充边缘折叠的兄弟姐妹之间的空间?
- javascript - 如何将变量字段名称放入 FormAssembly 表单上的隐藏字段中
- modal-dialog - 为什么模态视图在 SwiftUI 中只出现一次
- apache-nifi - Apache NiFi EvaluateJson 路径以 $(美元符号)开头
- javascript - 当militaryHour变量设置为0时,为什么系统打印0而不是我设置的小时(变量)是12?
- c# - 如何使用 JsonExtensionData 反序列化对象,使用 JsonProperty 序列化对象
- java - 我很难确定 LoopWhileDO 的作用范围