首页 > 解决方案 > 如何在 linux ubuntu 上安装 CUDA 10.1?

问题描述

我试图让 tensorflow 与 CUDA 10.1 一起工作,但每次我尝试安装任何驱动程序(任何版本)时,它都会继续安装 CUDA 11(与 tensorflow 不兼容)。我已经尝试过 .deb 安装驱动程序和 CUDA。我也试过安装最新的驱动,然后通过本地的.run文件安装CUDA 10.1,告诉CUDA不要安装驱动。这确实在我的 /usr/local 文件夹中安装了 cuda 10.1,但是当我尝试时nvidia-smi,它总是每次都指定 CUDA 11。

我做了很多研究,看到提到的版本nvidia-smi指定了最新支持的 cuda 运行时,但不一定反映实际安装的 CUDA 库?

所以我应该安装了 cuda 10.1(在 /usr/local 下)并尝试在 tensorflow 上运行测试命令: tf.config.list_physical_devices('GPU')但这会产生错误:

2020-09-30 17:36:38.765577: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
2020-09-30 17:36:38.765604: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
/home/robbe/Desktop/usiigaci-optimized/venv/lib/python3.7/site-packages/pandas/compat/__init__.py:120: UserWarning: Could not import the lzma module. Your installed Python is incomplete. Attempting to use lzma compression will result in a RuntimeError.
  warnings.warn(msg)
2020-09-30 17:36:40.493592: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-09-30 17:36:40.522334: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-30 17:36:40.522943: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce GTX 1050 Ti computeCapability: 6.1
coreClock: 1.455GHz coreCount: 6 deviceMemorySize: 3.94GiB deviceMemoryBandwidth: 104.43GiB/s
2020-09-30 17:36:40.523063: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
2020-09-30 17:36:40.583631: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-09-30 17:36:40.583961: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcufft.so.10'; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory
2020-09-30 17:36:40.584167: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcurand.so.10'; dlerror: libcurand.so.10: cannot open shared object file: No such file or directory
2020-09-30 17:36:40.584358: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcusolver.so.10'; dlerror: libcusolver.so.10: cannot open shared object file: No such file or directory
2020-09-30 17:36:40.584543: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcusparse.so.10'; dlerror: libcusparse.so.10: cannot open shared object file: No such file or directory
2020-09-30 17:36:40.704140: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2020-09-30 17:36:40.704203: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1753] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...

所以很明显它找不到正确的 cuda 10.1 对象库,尽管它确实存在于 /usr/local/cuda-10.1 下。在 /usr/bin 下还有可执行文件(包括显示 cuda 11 的 nvidia-smi),我认为这些会覆盖 /usr/local 下的 10.1 目录?

我尝试过的事情:

有效的事情:

我束手无策,我得出的结论是 tensorflow 和 CUDA 很难使用,但我需要它来工作,有人能帮忙吗?

谢谢你。

标签: pythontensorflow

解决方案


所以我找到了解决方案。

这确实是一个设置正确环境变量的问题。Tensorflow 查找存在于 cuda-10.1/include 和 cuda-10.1/lib64 下的特定目标文件,因此我只是将这些路径作为 LD_LIBRARY_PATH 添加到 ~/.bashrc 中的环境中,如下所示:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/include
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64

推荐阅读