首页 > 解决方案 > 为什么 tensorflow 仍然看不到我的 GPU?

问题描述

这几天我一直被这个问题弄糊涂了。我已经尝试重新安装我的驱动程序,但我仍然无法让我的 GPU 与 tensorflow 一起工作。

from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 13159433722602582150
, name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: 12266805389881928380
physical_device_desc: "device: XLA_CPU device"
, name: "/device:XLA_GPU:0"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 1094190154514983639
physical_device_desc: "device: XLA_GPU device"
]

这似乎与我的预期不同:

 [name: "/device:CPU:0"
    device_type: "CPU"
    memory_limit: 268435456
    locality {
    }
    incarnation: 4549764507052008926
    , name: "/device:XLA_CPU:0"
    device_type: "XLA_CPU"
    memory_limit: 17179869184
    locality {
    }
    incarnation: 5130440468361087955
    physical_device_desc: "device: XLA_CPU device"
    , name: "/device:GPU:0"
    device_type: "GPU"
    memory_limit: 3136264601
    locality {
    bus_id: 1
    links {
    }
    }
    incarnation: 8742529146709444949
    physical_device_desc: "device: 0, name: GeForce GTX 1050 Ti, pci bus id: 
    0000:01:00.0, compute capability: 6.1"
    , name: "/device:XLA_GPU:0"
    device_type: "XLA_GPU"
    memory_limit: 17179869184
    locality {
    }
    incarnation: 12774508348529661585
    physical_device_desc: "device: XLA_GPU device"
    ]
    [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
import torch
torch.cuda.is_available()
>>> True
tf.config.experimental.list_physical_devices('GPU')
>>> []

这是我的驱动程序详细信息。

nvidia-smi
Sun Nov 29 11:34:23 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 457.30       Driver Version: 457.30       CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 1050   WDDM  | 00000000:01:00.0 Off |                  N/A |
| N/A   53C    P0    N/A /  N/A |    112MiB /  4096MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     15232      C   ...ython\Python38\python.exe    N/A      |
+-----------------------------------------------------------------------------+

有人说问题是CUDA版本超过10。有人说问题是我应该卸载tensorflow并重新安装tensorflow-gpu。

标签: pythontensorflow

解决方案


正如评论中指出的,您应该安装支持CUDA和版本的 Tensorflow GPUCuDNN版本。

您的 Cuda 版本是,但在 tensorflow 中,预建版本及之前11.1的版本不支持它。2.3.0对于 tensorflow 2.4.0,您可以CUDA 11 and cuDNN 8.0.2按照发行说明中的​​说明使用,

https://github.com/tensorflow/tensorflow/releases/tag/v2.4.0-rc3

TensorFlow pip 包现在使用 CUDA11 和 cuDNN 8.0.2 构建。

如果您想使用 2.3.0,则降级到 cuda 10.1 和 cudnn 7.6。


推荐阅读