首页 > 解决方案 > 为什么 Python 代码没有在 GPU 上实现?已安装 TensorFlow-gpu、CUDA、CUDANN

问题描述

在 GPU 上执行 python 代码时,我是初学者。我有一个我想在 GPU 上运行的 CNN 代码。我的笔记本电脑上安装了 tensorflow-gpu、CUDA 和 CUDANN,但 Python 代码无法在 GPU 上执行。

英伟达-smi

我将在这里写下我尝试过的所有内容并发布输出

  1. 代码:

    pip freeze | grep tensorflow
    

    输出:

    tensorflow==2.0.0
    tensorflow-estimator==2.0.0
    tensorflow-gpu==2.0.0
    
  2. 代码:

    nvcc --version
    

    输出:

    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2019 NVIDIA Corporation
    Built on Fri_Feb__8_19:08:17_PST_2019
    Cuda compilation tools, release 10.1, V10.1.105
    
  3. 代码

    cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
    

    输出

    define CUDNN_MAJOR 7
    define CUDNN_MINOR 5
    define CUDNN_PATCHLEVEL 0
    define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
    include "driver_types.h"
    
  4. 代码:

    from __future__ import absolute_import, division, print_function, unicode_literals
    import tensorFlow as tf
    
    print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
    

    输出:

    Num GPUs Available:  0
    
  5. 代码

    import tensorflow
    from tensorflow.python.client import device_lib
    print(device_lib.list_local_devices())
    

    输出:

    2019-10-16 22:11:15.280922: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
    2019-10-16 22:11:15.484734: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2808000000 Hz
    2019-10-16 22:11:15.508127: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x45d4c60 executing computations on platform Host. Devices:
    2019-10-16 22:11:15.508212: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Host, Default Version
    2019-10-16 22:11:15.784006: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
    2019-10-16 22:11:15.785226: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x45d6ad0 executing computations on platform CUDA. Devices:
    2019-10-16 22:11:15.785278: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): GeForce GTX 1060, Compute Capability 6.1
    2019-10-16 22:11:15.785605: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
    2019-10-16 22:11:15.786528: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
    name: GeForce GTX 1060 major: 6 minor: 1 memoryClockRate(GHz): 1.6705
    pciBusID: 0000:01:00.0
    2019-10-16 22:11:15.786826: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/
    2019-10-16 22:11:15.787053: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/
    2019-10-16 22:11:15.787266: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/
    2019-10-16 22:11:15.787474: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/
    2019-10-16 22:11:15.787682: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/
    2019-10-16 22:11:15.787950: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/
    2019-10-16 22:11:15.788010: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
    2019-10-16 22:11:15.788036: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1641] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
    Skipping registering GPU devices...
    2019-10-16 22:11:15.788073: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] 
    Device interconnect StreamExecutor with strength 1 edge matrix:
    2019-10-16 22:11:15.788094: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 
    2019-10-16 22:11:15.788111: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N 
    [name: "/device:CPU:0"
    device_type: "CPU"
    memory_limit: 268435456
    locality {
    }
    incarnation: 7400412130462543104
    ,name: "/device:XLA_CPU:0"
    
    device_type: "XLA_CPU"
    memory_limit: 17179869184
    locality {
    }
    incarnation: 10419596086097903998
    physical_device_desc: "device: XLA_CPU device"
    ,name: "/device:XLA_GPU:0"
    device_type: "XLA_GPU"
    memory_limit: 17179869184
    locality {
    }
    incarnation: 10970348491339008844
    physical_device_desc: "device: XLA_GPU device"
    ]
    

我参考了几个网站,基本上说如果你安装了 GPU 和 tensorflow-gpu,那么程序将自动检测 GPU 并运行代码。我也知道StackOverflow上有类似的问题,上面的代码是找到类似问题的答案后实现的。tensorflow 2.0官网

tf.debugging.set_log_device_placement(True)

# Create some tensors
a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
c = tf.matmul(a, b)

print(c)

输出是:

RuntimeError: Device placement logging must be set at program startup

为什么我的程序没有在 gpu 上执行?

标签: pythontensorflowruntime-errorconv-neural-network

解决方案


如果你看这里——

2019-10-16 22:11:15.786826: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/
2019-10-16 22:11:15.787053: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/
2019-10-16 22:11:15.787266: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/
2019-10-16 22:11:15.787474: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/
2019-10-16 22:11:15.787682: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/
2019-10-16 22:11:15.787950: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/

它说,它正在寻找文件,Cuda 10.0但是,它找到的是Cuda 10.1文件。因此,第一步是卸载并删除 Cuda 10.1 版本并安装 Cuda 10.0。同时删除 tensorflow,只保留 tensorflow-gpu。对于所有其他版本,请遵循此处的确切建议。

让我们知道这是否能解决您的问题。


推荐阅读