首页 > 解决方案 > 在 Tensorflow 2.3.1 上无法使用 GeForce GTX 1070

问题描述

系统特点:

Ubuntu 20.04
NVIDIA Corporation GP104BM [GeForce GTX 1070 Mobile]
Intel® Core™ i7-7700HQ CPU @ 2.80GHz × 8
RAM: 31,3 GiB
Cuda Driver Version: 450.80.02 
tensorflow-2.3.1
CUDA Version: 11.0
cudnn-11.0

用户@Linux:~$ nvidia-smi

Sat Nov 21 15:52:38 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02    Driver Version: 450.80.02    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 1070    Off  | 00000000:01:00.0  On |                  N/A |
| N/A   50C    P8     9W /  N/A |    297MiB /  8111MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A       932      G   /usr/lib/xorg/Xorg                 92MiB |
|    0   N/A  N/A      1295      G   /usr/bin/gnome-shell              111MiB |
|    0   N/A  N/A      3277      G   /usr/lib/firefox/firefox            1MiB |
|    0   N/A  N/A      3828      C   /usr/bin/python3                   87MiB |
+-----------------------------------------------------------------------------+

用户@Linux:~$ nvcc -V

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

从 tensorflow.python.client 导入 device_lib 打印(device_lib.list_local_devices())


[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 11676041203334616666
, name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: 9716226955421748203
physical_device_desc: "device: XLA_CPU device"
, name: "/device:XLA_GPU:0"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 1948208650029266498
physical_device_desc: "device: XLA_GPU device"
]

当我运行代码时,我得到一个错误:

with tf.device('/device:XLA_GPU:0'):
    history=model.fit(train_generator,
                      epochs=10,
                      validation_data=valid_generator,
                      validation_steps=test.shape[0]//batch_size,
                      steps_per_epoch=train.shape[0]//batch_size
                        )

Epoch 1/10

---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-15-cc03ad3613b3> in <module>
      1 with tf.device('/device:XLA_GPU:0'):
----> 2     history=model.fit(train_generator,
      3                       epochs=10,
      4                       validation_data=valid_generator,
      5                       validation_steps=test.shape[0]//batch_size,
...

Epoch 1/10

---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-15-cc03ad3613b3> in <module>
      1 with tf.device('/device:XLA_GPU:0'):
----> 2     history=model.fit(train_generator,
      3                       epochs=10,
      4                       validation_data=valid_generator,
      5                       validation_steps=test.shape[0]//batch_size,

没有代码:使用 tf.device('/device:XLA_GPU:0') 一切正常,但是很长一段时间(没有 GPU 工作......)

标签: tensorflowgpu

解决方案


将 XLA_GPA:0 替换为 GPU:0

使用下面的代码片段来避免此错误。

with tf.device('/device:GPU:0'):
    history=model.fit(train_generator,
                      epochs=10,
                      validation_data=valid_generator,
                      validation_steps=test.shape[0]//batch_size,
                      steps_per_epoch=train.shape[0]//batch_size
                      )

推荐阅读