python - 使用 tensorflow 时没有 GPU 的设备名称
问题描述
我正在尝试将我的 GPU 与 Tensorflow 2.4.0 一起使用,但似乎找不到。
系统规格:
Tensorflow 版本:2.4.0
Nvidia 驱动程序:460.39,CUDA 11.2
Cuda 版本:11.1
Ubuntu 18.04
gcc 版本:7.4.0
Python 3.6
GeForce RTX 2080 ti
添加到 .bashrc:
export PATH=/usr/local/cuda-11.1/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-11.1/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/extras/CUPTI/lib64:$LD_LIBRARY_PATH
当我在 Jupyter 笔记本(或命令窗口)上运行以下代码时,我得到以下输出:
import os
os.environ['TF_XLA_FLAGS'] = '--tf_xla_enable_xla_devices'
import tensorflow as tf
print("Tensorflow version: ", tf.__version__)
import keras
print("Keras verion: ", keras.__version__)
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
print("GPUs: ", len(tf.config.experimental.list_physical_devices('GPU')))
print(tf.test.is_built_with_cuda())
print(tf.test.is_gpu_available())
Tensorflow version: 2.4.0
Keras verion: 2.3.0
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 17371587508386671680
, name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: 14652116595346898424
physical_device_desc: "device: XLA_CPU device"
, name: "/device:XLA_GPU:0"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 16411682041600468605
physical_device_desc: "device: XLA_GPU device"
]
以下是在 cmd 窗口中运行的内容:
2021-02-21 14:00:44.733163: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-02-21 14:00:44.734101: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-02-21 14:00:44.760683: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-21 14:00:44.761566: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:42:00.0 name: GeForce RTX 2080 Ti computeCapability: 7.5
coreClock: 1.545GHz coreCount: 68 deviceMemorySize: 10.76GiB deviceMemoryBandwidth: 573.69GiB/s
2021-02-21 14:00:44.761595: I tensorflow/stream_executor/platform/default/dso_loader.cc:49]
Successfully opened dynamic library libcudart.so.11.0
2021-02-21 14:00:44.764263: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-02-21 14:00:44.764327: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-02-21 14:00:44.765316: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-02-21 14:00:44.765556: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-02-21 14:00:44.765754: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcusolver.so.10'; dlerror: libcusolver.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-11.1/lib64
2021-02-21 14:00:44.766385: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-02-21 14:00:44.766524: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-02-21 14:00:44.766537: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2021-02-21 14:00:44.835456: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-02-21 14:00:44.835494: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0
2021-02-21 14:00:44.835506: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N
所以看起来 TF 看到了 GPU 但没有使用它?我不确定问题是什么或为什么我不能使用 GPU。如果我尝试使用设置会话
with tf.device('/device:XLA_GPU:0')
我收到以下错误:
InvalidArgumentError: Cannot assign a device for operation add: {{node add}} 已明确分配给 /device:XLA_GPU:0 但可用设备是 [ /job:localhost/replica:0/task:0/设备:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0]。确保设备规范引用了有效的设备。
但是,如果我使用 CPU,它就可以工作。
解决方案
推荐阅读
- ios - 无法在物理设备上启动 Xcode 项目
- python - 基于日期时间索引构建列
- reactjs - 反应输入不输入
- javascript - 在父组件状态中存储回调时过时的关闭
- textfield - SwiftUI TextField 建议不带 textContentType 的 OTP 代码
- azure - 当 WEBSITE_RUN_FROM_PACKAGE 的内容发生变化时,如何重新启动 Azure 函数?
- ssl - 禁用 TLS1.2 密码套装 TLS_RSA_WITH_3DES_EDE_CBC_SHA 后,会话状态服务器不工作
- service - DDD:阻止外部服务调用和事件
- java - 在 Camunda 的 ExternalTaskClient 中启用 HTTPS
- c++ - 如何使用 Win32 API 显示“打开文件”弹出窗口