首页 > 解决方案 > 如何运行张量流 GPU

问题描述

我有一个问题,我的 jupyter Notebook 不能在 gpu 上运行。我更新了我的驱动程序(Nvidia GTX 1660 Ti),安装了 CUDA 11,将 CuDNN 文件放入文件夹中,并将正确的路径放入环境变量中。之后,我向 Anaconda 添加了一个新环境,包括一个 GPU 内核并安装了 tensorflow-gpu(版本 2.4,因为 CUDA 11 需要版本 >= 2.4.0),就像在这个视频中解释的那样。

之后,我用新内核打开了 jupyter notebook。所以我可以运行我的代码,直到某个步骤有效,但我在任务管理器中的 GPU 利用率低于 1%,我的 RAM 为 60%-99%。所以我认为,我的代码没有在 GPU 上运行。我做了一些测试:

import tensorflow.keras
import tensorflow as tf

print(tf.__version__)
print(tensorflow.keras.__version__)

print(tf.test.is_built_with_cuda())
print(tf.config.list_physical_devices('GPU'))
print(tf.test.is_gpu_available())

导致(我认为是正确的):

2.4.0
2.4.0
True
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
True

下一个测试是:

from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

什么导致:

[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 9334837591848971536
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 4837251481
locality {
  bus_id: 1
  links {
  }
}
incarnation: 2660164806064353779
physical_device_desc: "device: 0, name: GeForce GTX 1660 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5"
]

所以那个内核中有 CPU 和 GPU,不是吗?

如果我的神经网络在 GPU 而不是 CPU 上运行,我该怎么办?

我的代码一直在运行,直到我尝试训练我的神经网络。这是代码和发生的错误:

model.fit([np.asarray(X_train).astype(np.float32), np.asarray(X_train_zusatz).astype(np.float32)], 
          y_train, epochs=10, batch_size=10)

它是一个串联的神经网络,如果你想知道输入并且它在正常的 tensorflow(不是 tensorflow-gpu)上工作得很好。但是训练需要非常非常长的时间。

Epoch 1/10
---------------------------------------------------------------------------
ResourceExhaustedError                    Traceback (most recent call last)
<ipython-input-27-10813edc74c8> in <module>
      3 
      4 model.fit([np.asarray(X_train).astype(np.float32), np.asarray(X_train_zusatz).astype(np.float32)], 
----> 5           y_train, epochs=10, batch_size=10)#, 
      6           #validation_data=[[X_test, X_test_zusatz], y_test], class_weight=class_weight)

~\.conda\envs\tf-gpu\lib\site-pac

kages\tensorflow\python\keras\engine\training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing)
       1098                 _r=1):
       1099               callbacks.on_train_batch_begin(step)
    -> 1100               tmp_logs = self.train_function(iterator)
       1101               if data_handler.should_sync:
       1102                 context.async_wait()
    
    ~\.conda\envs\tf-gpu\lib\site-packages\tensorflow\python\eager\def_function.py in __call__(self, *args, **kwds)
        826     tracing_count = self.experimental_get_tracing_count()
        827     with trace.Trace(self._name) as tm:
    --> 828       result = self._call(*args, **kwds)
        829       compiler = "xla" if self._experimental_compile else "nonXla"
        830       new_tracing_count = self.experimental_get_tracing_count()
    
    ~\.conda\envs\tf-gpu\lib\site-packages\tensorflow\python\eager\def_function.py in _call(self, *args, **kwds)
        886         # Lifting succeeded, so variables are initialized and we can run the
        887         # stateless function.
    --> 888         return self._stateless_fn(*args, **kwds)
        889     else:
        890       _, _, _, filtered_flat_args = \
    
    ~\.conda\envs\tf-gpu\lib\site-packages\tensorflow\python\eager\function.py in __call__(self, *args, **kwargs)
       2941        filtered_flat_args) = self._maybe_define_function(args, kwargs)
       2942     return graph_function._call_flat(
    -> 2943         filtered_flat_args, captured_inputs=graph_function.captured_inputs)  # pylint: disable=protected-access
       2944 
       2945   @property
    
    ~\.conda\envs\tf-gpu\lib\site-packages\tensorflow\python\eager\function.py in _call_flat(self, args, captured_inputs, cancellation_manager)
       1917       # No tape is watching; skip to running the function.
       1918       return self._build_call_outputs(self._inference_function.call(
    -> 1919           ctx, args, cancellation_manager=cancellation_manager))
       1920     forward_backward = self._select_forward_and_backward_functions(
       1921         args,
    
    ~\.conda\envs\tf-gpu\lib\site-packages\tensorflow\python\eager\function.py in call(self, ctx, args, cancellation_manager)
        558               inputs=args,
        559               attrs=attrs,
    --> 560               ctx=ctx)
        561         else:
        562           outputs = execute.execute_with_cancellation(
    
    ~\.conda\envs\tf-gpu\lib\site-packages\tensorflow\python\eager\execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
         58     ctx.ensure_initialized()
         59     tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
    ---> 60                                         inputs, attrs, num_outputs)
         61   except core._NotOkStatusException as e:
         62     if name is not None:
    
    ResourceExhaustedError: 2 root error(s) found.
      (0) Resource exhausted:  OOM when allocating tensor with shape[300,300] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
         [[{{node model/lstm/while/body/_1/model/lstm/while/lstm_cell/split}}]]
    Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
    
         [[gradient_tape/model/embedding/embedding_lookup/Reshape/_74]]
    Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
    
      (1) Resource exhausted:  OOM when allocating tensor with shape[300,300] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
         [[{{node model/lstm/while/body/_1/model/lstm/while/lstm_cell/split}}]]
    Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
    
    0 successful operations.
    0 derived errors ignored. [Op:__inference_train_function_4691]
    
    Function call stack:
    train_function -> train_function

为什么会出现这个错误?

-更新- 是我的“nvidia-smi”在训练我的模型时的样子(大约训练 20 秒后)。

谢谢你和最好的问候,丹尼尔

标签: tensorflowneural-networkanacondagpu

解决方案


推荐阅读