首页 > 解决方案 > collect2:错误:ld 返回 1 个退出状态(-lcudnn)

问题描述

编辑 1

我忘了提到我想使用 theano 库。

在我联系管理员后,他们提供了 cudnn 模块。但是,我仍然遇到同样的错误。

$ module load devel/cudnn/10.2
$ python -c "import theano"
Can not use cuDNN on context None: cannot compile with cuDNN. We got this error:
/pfs/work7/workspace/scratch/ul_dco32-conda-0/conda/envs/my_env/bin/../lib/
gcc/x86_64-conda_cos6-linux-gnu/7.3.0/../../../../
x86_64-conda_cos6-linux-gnu/bin/ld: cannot find -lcudnn
collect2: error: ld returned 1 exit status

Mapped name None to device cuda: Tesla V100-SXM2-32GB (0000:3A:00.0)
$ vi $HOME/.theanorc
[global]
floatX = float32
device = cuda

[cuda]
root=$CUDA_HOME/bin

[dnn]
include_path=$CUDA_HOME/include
library_path=$CUDA_HOME/lib64

[lib]
cnmem=1

所以,很可能,我有一个链接问题,但是,我在哪里找不到它。

问题

我正在苦苦挣扎ld: cannot find -lcudnn

描述

我想在我的项目中使用 Cuda 和 CuDnn。我在集群中工作,Cuda 已经安装在集群中(ps 我没有 root 权限)。所以我将 Cuda 文件复制到我的本地文件夹中,并按照官方网站上的说明安装 Cudnn。但是, ld 找不到libcudnn.so.

$ lsb_release -a
Description:    Red Hat Enterprise Linux Server release 7.7 (Maipo)
Release:        7.7

我试过的

$ ld -lcudnn --verbose
attempt to open //usr/x86_64-redhat-linux/lib64/libcudnn.so failed
attempt to open //usr/x86_64-redhat-linux/lib64/libcudnn.a failed
attempt to open //usr/lib64/libcudnn.so failed
attempt to open //usr/lib64/libcudnn.a failed
attempt to open //usr/local/lib64/libcudnn.so failed
attempt to open //usr/local/lib64/libcudnn.a failed
attempt to open //lib64/libcudnn.so failed
attempt to open //lib64/libcudnn.a failed
attempt to open //usr/x86_64-redhat-linux/lib/libcudnn.so failed
attempt to open //usr/x86_64-redhat-linux/lib/libcudnn.a failed
attempt to open //usr/local/lib/libcudnn.so failed
attempt to open //usr/local/lib/libcudnn.a failed
attempt to open //lib/libcudnn.so failed
attempt to open //lib/libcudnn.a failed
attempt to open //usr/lib/libcudnn.so failed
attempt to open //usr/lib/libcudnn.a failed
ld: cannot find -lcudnn

libcudnn.so如果我手动添加库路径

$ ld -L "$CUDA_HOME/lib64" -lcudnn
attempt to open /home/ul/ul_student/ul_dco32/pkg/cuda/10.2/lib64/libcudnn.so succeeded
-lcudnn (/home/ul/ul_student/ul_dco32/pkg/cuda/10.2/lib64/libcudnn.so)
librt.so.1 needed by /home/ul/ul_student/ul_dco32/pkg/cuda/10.2/lib64/libcudnn.so
found librt.so.1 at /usr/lib64/librt.so.1
libpthread.so.0 needed by /home/ul/ul_student/ul_dco32/pkg/cuda/10.2/lib64/libcudnn.so
found libpthread.so.0 at /usr/lib64/libpthread.so.0
libdl.so.2 needed by /home/ul/ul_student/ul_dco32/pkg/cuda/10.2/lib64/libcudnn.so
found libdl.so.2 at /usr/lib64/libdl.so.2
libstdc++.so.6 needed by /home/ul/ul_student/ul_dco32/pkg/cuda/10.2/lib64/libcudnn.so
found libstdc++.so.6 at /usr/lib64/libstdc++.so.6
libm.so.6 needed by /home/ul/ul_student/ul_dco32/pkg/cuda/10.2/lib64/libcudnn.so
found libm.so.6 at /usr/lib64/libm.so.6
libgcc_s.so.1 needed by /home/ul/ul_student/ul_dco32/pkg/cuda/10.2/lib64/libcudnn.so
found libgcc_s.so.1 at /usr/lib64/libgcc_s.so.1
libc.so.6 needed by /home/ul/ul_student/ul_dco32/pkg/cuda/10.2/lib64/libcudnn.so
found libc.so.6 at /usr/lib64/libc.so.6
ld-linux-x86-64.so.2 needed by /home/ul/ul_student/ul_dco32/pkg/cuda/10.2/lib64/libcudnn.so
found ld-linux-x86-64.so.2 at /usr/lib64/ld-linux-x86-64.so.2
ld: warning: cannot find entry symbol _start; not setting start address

这是我的LD_LIBRARY_PATH

$ echo $LD_LIBRARY_PATH
/home/ul/ul_student/ul_dco32/pkg/cuda/10.2/lib64

ld 不知何故忽略了LD_LIBRARY_PATH. 由于我没有 root 权限,因此我无法创建符号链接或更改ldconfig. 因此,设置LD_LIBRARY_PATH是我可以修复它的唯一方法。

谢谢你的帮助。

标签: linuxlinkergnuld

解决方案


原来问题出在theano配置上。声明 CUDA_HOME(或 CUDNN_HOME)的完整路径而不是使用 $CUDA_HOME(或 $CUDNN_HOME)解决了我的问题。

[cuda]
root=path/to/cuda/bin

[dnn]
include_path=path/to/cudnn/include
library_path=path/to/cudnn/lib64

PS Cudnn 和 Cuda 模块是由管理员安装的,所以一个大麻烦就消失了。


推荐阅读