tensorflow - Tensorflow GPU CUDA 无法加载动态库“libcufft.so.10”;错误
问题描述
我担心这会被标记为重复,但我发现有libcudart
或libcublas
没有的例子libcufft
(这是我的问题)。
我安装了 TensorFlow,我想使用 GPU。因此,我在此链接上运行脚本。
运行 TensorFlow 训练网络时,我收到以下消息:
2021-09-23 11:19:22.158959: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-23 11:19:22.162563: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcufft.so.10'; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory
2021-09-23 11:19:22.162651: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcurand.so.10'; dlerror: libcurand.so.10: cannot open shared object file: No such file or directory
2021-09-23 11:19:22.162730: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory
2021-09-23 11:19:22.162806: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory
2021-09-23 11:19:22.162989: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1835] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2021-09-23 11:19:22.163345: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
使用tf.config.list_physical_devices()
我得到:
2021-09-23 11:30:18.327648: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-23 11:30:18.329447: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcufft.so.10'; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/extras/CUPTI/lib64
2021-09-23 11:30:18.329510: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcurand.so.10'; dlerror: libcurand.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/extras/CUPTI/lib64
2021-09-23 11:30:18.329573: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/extras/CUPTI/lib64
2021-09-23 11:30:18.329687: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/extras/CUPTI/lib64
2021-09-23 11:30:18.329814: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1835] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')]
我有一个名为/usr/local/cuda-11.0
但并不cuda
孤单的文件夹,我也没有extras
文件夹。确实它说的是 Ubuntu 18.04,而我有 Ubuntu 20.04。
如果我尝试按照这里sudo apt install nvidia-cuda-toolkit
的建议运行,我会得到:
Reading package lists... Done
Building dependency tree
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:
The following packages have unmet dependencies:
nvidia-cuda-toolkit : Depends: nvidia-cuda-dev (= 10.1.243-3) but it is not going to be installed
Recommends: nsight-compute (= 10.1.243-3)
Recommends: nsight-systems (= 10.1.243-3)
E: Unable to correct problems, you have held broken packages.
的输出whereis cuda
为cuda:
(空)。
的输出nvidia-smi
:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.91.03 Driver Version: 460.91.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce RTX 208... Off | 00000000:01:00.0 On | N/A |
| 0% 40C P8 31W / 300W | 626MiB / 11016MiB | 15% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1141 G /usr/lib/xorg/Xorg 59MiB |
| 0 N/A N/A 1749 G /usr/lib/xorg/Xorg 315MiB |
| 0 N/A N/A 1886 G /usr/bin/gnome-shell 59MiB |
| 0 N/A N/A 1907 G ...mviewer/tv_bin/TeamViewer 2MiB |
| 0 N/A N/A 2463 G ...ble-features=SpareRendere 4MiB |
| 0 N/A N/A 3825 G ...AAAAAAAAA= --shared-files 105MiB |
| 0 N/A N/A 4682 G .../debug.log --shared-files 36MiB |
| 0 N/A N/A 20600 G ...AAAAAAAAA= --shared-files 24MiB |
+-----------------------------------------------------------------------------+
我害怕安装东西来解决它并以典型的 20 个版本的 CUDA 相互冲突来完成。
解决方案
所以我按照评论中的建议做了,并以非常激进的方式卸载了所有内容:
sudo apt clean
sudo apt update
sudo apt purge cuda
sudo apt purge nvidia-*
sudo apt autoremove
然后我按照说明安装:
- CUDA
- CUDA Toolkit(虽然我觉得是一样的,只是加了一个命令
sudo apt-get install nvidia-gds
,我什至不知道有没有必要) - CUDNN
现在它似乎正在工作。
推荐阅读
- excel - 无法将选项卡移动到所需位置
- java - 哪种设计模式用于创建继承对象?
- r - 在 R 或任何其他语言中更改随机元素的时间格式
- react-native - Mobx 不更新商店。每次操作调用后需要手动刷新
- android - 应用内更新提供 UPDATE_NOT_AVAILABLE 而在 Playstore 上我可以看到更新按钮
- node.js - 无法查看酱实验室测试结果
- swift - SwiftUI:UIAlertController ActionSheet 全屏
- linux - Error installing Docker `https://download.docker.com/linux/ubuntu focal Release 404 Not Found`
- c++ - 我如何在我的代码中使用 cin 的 argc 和 argv insted,命令行参数的新手,我想了解如何做到这一点
- r - 是否可以将自定义形状(来自 png)添加到 ggplot 图例?