tensorflow - TensorFlow 没有检测到我的 GPU。我该怎么办(2021 年 5 月)?
问题描述
TF 版本:2.4.1 CUDA 版本:11.1
tf.test_is_gpu_available() -- 返回 --> FALSE tf.test.is_built_with_cuda() -- 返回 --> TRUE
我试图将 TF 恢复到 2.4.0,但没有用
我也试过:
$ pip uninstall tensorflow
$ pip install tensorflow-gpu
但似乎没有任何效果,TF 只是没有检测到我的 GPU
编辑1:
nvcc --version 的输出:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0
nvidia-smi 的输出
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.73.01 Driver Version: 460.73.01 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce RTX 3090 Off | 00000000:01:00.0 Off | N/A |
| 30% 35C P8 23W / 300W | 23MiB / 24268MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 GeForce RTX 3090 Off | 00000000:43:00.0 Off | N/A |
| 30% 40C P8 27W / 300W | 5MiB / 24268MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 GeForce RTX 3090 Off | 00000000:81:00.0 Off | N/A |
| 64% 63C P2 179W / 300W | 24043MiB / 24268MiB | 59% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 2362 G /usr/lib/xorg/Xorg 9MiB |
| 0 N/A N/A 2564 G /usr/bin/gnome-shell 12MiB |
| 1 N/A N/A 2362 G /usr/lib/xorg/Xorg 4MiB |
| 2 N/A N/A 2362 G /usr/lib/xorg/Xorg 4MiB |
| 2 N/A N/A 14304 C python3 24035MiB |
+-----------------------------------------------------------------------------+
在运行 tf.test.is_gpu_avaliable() 时,我收到以下警告:
WARNING:tensorflow:From Spell_correction.py:35: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.
2021-05-07 21:46:21.855460: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-05-07 21:46:21.856690: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:43:00.0 name: GeForce RTX 3090 computeCapability: 8.6
coreClock: 1.695GHz coreCount: 82 deviceMemorySize: 23.70GiB deviceMemoryBandwidth: 871.81GiB/s
2021-05-07 21:46:21.856716: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-05-07 21:46:21.856735: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-05-07 21:46:21.856747: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-05-07 21:46:21.856759: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-05-07 21:46:21.856771: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-05-07 21:46:21.856829: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcusolver.so.10'; dlerror: libcusolver.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.1/lib64
2021-05-07 21:46:21.856846: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-05-07 21:46:21.856856: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-05-07 21:46:21.856863: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2021-05-07 21:46:21.942589: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-05-07 21:46:21.942626: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0
2021-05-07 21:46:21.942633: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N
另一个观察:
Pytorch 正在检测 GPU,而 TF 没有。
torch.cuda.is_available() --> TRUE tf.test.is_gpu_available() --> FALSE
解决方案
如果您使用 ubuntu 20.04,我建议您按照此处的步骤操作。我最近遇到了同样的问题。
你有
NVIDIA-SMI 460.73.01 Driver Version: 460.73.01 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce RTX 3090 Off | 00000000:01:00.0 Off | N/A |
| 30% 35C P8 23W / 300W | 23MiB / 24268MiB | 0% Default |
| | | N/A
尝试获取最新版本的NVIDIA 465和Cuda 11.3。对于我的情况,nvidia-smi 如下:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.19.01 Driver Version: 465.19.01 CUDA Version: 11.3 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
我做了什么;
(1) 我卸载了 NVIDIA 和 CUDA 完全看这里,小心。
sudo apt-get remove --purge '^nvidia-.*'
sudo apt-get install ubuntu-desktop
sudo rm /etc/X11/xorg.conf
echo 'nouveau' | sudo tee -a /etc/modules
(2)我下载了NVIDIA,下载 .run 文件并简单地运行sudo bash NVIDIA*.run
(3)我下载了 cuDNN并按照此处所述执行以下操作
tar -xzvf cudnn-11.3-.*.tgz
sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
还要检查 .bashrc 文件以及此处所述:
cd ~
gedit .bashrc
或者nano .bashrc
#最后添加这个:
export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
export PATH=/usr/local/cuda-11.3/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-11.3/lib64\${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda11.3/targets/x86_64linux\${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
然后,pip install tensorflow-gpu==2.4.1
推荐阅读
- sql - 计数相等的 SQL SELECT 行
- c# - 在 MVC 应用程序中将 xml 数据发布到 rest api
- debugging - 循环引用会导致 AutoHotKey 出错吗?
- ms-word - 为什么在循环中跟踪范围对象会导致错误?
- node.js - 在 loopback4 中无法连接到数据库(在我的情况下为 Postgresql)
- vue.js - Vuex 商店参考 Vuetify -“TypeError:无法读取未定义的属性‘主题’”
- emacs - 如何让 emacs calc 识别 System Verilog 格式的数字
- c++ - 无法将模板参数转换为匹配函数参数
- ios - SwiftUI 将操作从页面发送到 PageViewController
- python - Python ABI“cp37”和“cp37m”有什么区别?