首页 > 解决方案 > 使用 tensorflow 时没有 GPU 的设备名称

问题描述

我正在尝试将我的 GPU 与 Tensorflow 2.4.0 一起使用,但似乎找不到。

系统规格:
Tensorflow 版本:2.4.0
Nvidia 驱动程序:460.39,CUDA 11.2
Cuda 版本:11.1
Ubuntu 18.04
gcc 版本:7.4.0
Python 3.6
GeForce RTX 2080 ti

添加到 .bashrc:

export PATH=/usr/local/cuda-11.1/bin${PATH:+:${PATH}}  
export LD_LIBRARY_PATH=/usr/local/cuda-11.1/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}  
export LD_LIBRARY_PATH=/usr/local/cuda/extras/CUPTI/lib64:$LD_LIBRARY_PATH  
 

当我在 Jupyter 笔记本(或命令窗口)上运行以下代码时,我得到以下输出:

import os  
os.environ['TF_XLA_FLAGS'] = '--tf_xla_enable_xla_devices'  
import tensorflow as tf  
print("Tensorflow version: ", tf.__version__)  
import keras  
print("Keras verion: ", keras.__version__)  
from tensorflow.python.client import device_lib  
print(device_lib.list_local_devices())  
print("GPUs: ", len(tf.config.experimental.list_physical_devices('GPU')))  
print(tf.test.is_built_with_cuda())  
print(tf.test.is_gpu_available()) 




Tensorflow version:  2.4.0  
Keras verion:  2.3.0  
[name: "/device:CPU:0"  
device_type: "CPU"  
memory_limit: 268435456  
locality {  
}  
incarnation: 17371587508386671680  
, name: "/device:XLA_CPU:0"  
device_type: "XLA_CPU"  
memory_limit: 17179869184  
locality {  
}  
incarnation: 14652116595346898424  
physical_device_desc: "device: XLA_CPU device"  
, name: "/device:XLA_GPU:0"  
device_type: "XLA_GPU"  
memory_limit: 17179869184  
locality {  
}  
incarnation: 16411682041600468605  
physical_device_desc: "device: XLA_GPU device"  
]  

以下是在 cmd 窗口中运行的内容:

2021-02-21 14:00:44.733163: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set  
2021-02-21 14:00:44.734101: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1  
2021-02-21 14:00:44.760683: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero  
2021-02-21 14:00:44.761566: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:   
pciBusID: 0000:42:00.0 name: GeForce RTX 2080 Ti computeCapability: 7.5
coreClock: 1.545GHz coreCount: 68 deviceMemorySize: 10.76GiB deviceMemoryBandwidth: 573.69GiB/s  
2021-02-21 14:00:44.761595: I tensorflow/stream_executor/platform/default/dso_loader.cc:49]  
 Successfully opened dynamic library libcudart.so.11.0
2021-02-21 14:00:44.764263: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11  
2021-02-21 14:00:44.764327: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11  
2021-02-21 14:00:44.765316: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10  
2021-02-21 14:00:44.765556: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10  
2021-02-21 14:00:44.765754: W tensorflow/stream_executor/platform/default/dso_loader.cc:60]   Could not load dynamic library 'libcusolver.so.10'; dlerror: libcusolver.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-11.1/lib64
2021-02-21 14:00:44.766385: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11  
2021-02-21 14:00:44.766524: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8  
2021-02-21 14:00:44.766537: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at   https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.  
Skipping registering GPU devices...  
2021-02-21 14:00:44.835456: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:  
2021-02-21 14:00:44.835494: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]      0   
2021-02-21 14:00:44.835506: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0:   N   

所以看起来 TF 看到了 GPU 但没有使用它?我不确定问题是什么或为什么我不能使用 GPU。如果我尝试使用设置会话

with tf.device('/device:XLA_GPU:0')  

我收到以下错误:
InvalidArgumentError: Cannot assign a device for operation add: {{node add}} 已明确分配给 /device:XLA_GPU:0 但可用设备是 [ /job:localhost/replica:0/task:0/设备:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0]。确保设备规范引用了有效的设备。

但是,如果我使用 CPU,它就可以工作。

标签: pythontensorflowgpu

解决方案


推荐阅读