tensorflow - 如何运行张量流 GPU
问题描述
我有一个问题,我的 jupyter Notebook 不能在 gpu 上运行。我更新了我的驱动程序(Nvidia GTX 1660 Ti),安装了 CUDA 11,将 CuDNN 文件放入文件夹中,并将正确的路径放入环境变量中。之后,我向 Anaconda 添加了一个新环境,包括一个 GPU 内核并安装了 tensorflow-gpu(版本 2.4,因为 CUDA 11 需要版本 >= 2.4.0),就像在这个视频中解释的那样。
之后,我用新内核打开了 jupyter notebook。所以我可以运行我的代码,直到某个步骤有效,但我在任务管理器中的 GPU 利用率低于 1%,我的 RAM 为 60%-99%。所以我认为,我的代码没有在 GPU 上运行。我做了一些测试:
import tensorflow.keras
import tensorflow as tf
print(tf.__version__)
print(tensorflow.keras.__version__)
print(tf.test.is_built_with_cuda())
print(tf.config.list_physical_devices('GPU'))
print(tf.test.is_gpu_available())
导致(我认为是正确的):
2.4.0
2.4.0
True
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
True
下一个测试是:
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
什么导致:
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 9334837591848971536
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 4837251481
locality {
bus_id: 1
links {
}
}
incarnation: 2660164806064353779
physical_device_desc: "device: 0, name: GeForce GTX 1660 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5"
]
所以那个内核中有 CPU 和 GPU,不是吗?
如果我的神经网络在 GPU 而不是 CPU 上运行,我该怎么办?
我的代码一直在运行,直到我尝试训练我的神经网络。这是代码和发生的错误:
model.fit([np.asarray(X_train).astype(np.float32), np.asarray(X_train_zusatz).astype(np.float32)],
y_train, epochs=10, batch_size=10)
它是一个串联的神经网络,如果你想知道输入并且它在正常的 tensorflow(不是 tensorflow-gpu)上工作得很好。但是训练需要非常非常长的时间。
Epoch 1/10
---------------------------------------------------------------------------
ResourceExhaustedError Traceback (most recent call last)
<ipython-input-27-10813edc74c8> in <module>
3
4 model.fit([np.asarray(X_train).astype(np.float32), np.asarray(X_train_zusatz).astype(np.float32)],
----> 5 y_train, epochs=10, batch_size=10)#,
6 #validation_data=[[X_test, X_test_zusatz], y_test], class_weight=class_weight)
~\.conda\envs\tf-gpu\lib\site-pac
kages\tensorflow\python\keras\engine\training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing)
1098 _r=1):
1099 callbacks.on_train_batch_begin(step)
-> 1100 tmp_logs = self.train_function(iterator)
1101 if data_handler.should_sync:
1102 context.async_wait()
~\.conda\envs\tf-gpu\lib\site-packages\tensorflow\python\eager\def_function.py in __call__(self, *args, **kwds)
826 tracing_count = self.experimental_get_tracing_count()
827 with trace.Trace(self._name) as tm:
--> 828 result = self._call(*args, **kwds)
829 compiler = "xla" if self._experimental_compile else "nonXla"
830 new_tracing_count = self.experimental_get_tracing_count()
~\.conda\envs\tf-gpu\lib\site-packages\tensorflow\python\eager\def_function.py in _call(self, *args, **kwds)
886 # Lifting succeeded, so variables are initialized and we can run the
887 # stateless function.
--> 888 return self._stateless_fn(*args, **kwds)
889 else:
890 _, _, _, filtered_flat_args = \
~\.conda\envs\tf-gpu\lib\site-packages\tensorflow\python\eager\function.py in __call__(self, *args, **kwargs)
2941 filtered_flat_args) = self._maybe_define_function(args, kwargs)
2942 return graph_function._call_flat(
-> 2943 filtered_flat_args, captured_inputs=graph_function.captured_inputs) # pylint: disable=protected-access
2944
2945 @property
~\.conda\envs\tf-gpu\lib\site-packages\tensorflow\python\eager\function.py in _call_flat(self, args, captured_inputs, cancellation_manager)
1917 # No tape is watching; skip to running the function.
1918 return self._build_call_outputs(self._inference_function.call(
-> 1919 ctx, args, cancellation_manager=cancellation_manager))
1920 forward_backward = self._select_forward_and_backward_functions(
1921 args,
~\.conda\envs\tf-gpu\lib\site-packages\tensorflow\python\eager\function.py in call(self, ctx, args, cancellation_manager)
558 inputs=args,
559 attrs=attrs,
--> 560 ctx=ctx)
561 else:
562 outputs = execute.execute_with_cancellation(
~\.conda\envs\tf-gpu\lib\site-packages\tensorflow\python\eager\execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
58 ctx.ensure_initialized()
59 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
---> 60 inputs, attrs, num_outputs)
61 except core._NotOkStatusException as e:
62 if name is not None:
ResourceExhaustedError: 2 root error(s) found.
(0) Resource exhausted: OOM when allocating tensor with shape[300,300] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node model/lstm/while/body/_1/model/lstm/while/lstm_cell/split}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[gradient_tape/model/embedding/embedding_lookup/Reshape/_74]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
(1) Resource exhausted: OOM when allocating tensor with shape[300,300] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node model/lstm/while/body/_1/model/lstm/while/lstm_cell/split}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_4691]
Function call stack:
train_function -> train_function
为什么会出现这个错误?
-更新- 这是我的“nvidia-smi”在训练我的模型时的样子(大约训练 20 秒后)。
谢谢你和最好的问候,丹尼尔
解决方案
推荐阅读
- javascript - 如何在终端中运行没有 UI 的电子应用程序
- python - 根据另一个变量的取值范围定义新变量
- json - 如何将嵌套结构编组为平面 JSON
- python - discord.py 后台任务循环的问题
- c - 交换矩阵中的最大值和最小值
- laravel - BadMethodCallException 调用未定义的方法 App\Models\User::id()
- python-3.x - Django 项目:自定义管理模板未加载到 AWS ElasticBean 服务器上
- css - React App 中不显示背景图像
- mysql - 冻结mysql表
- javascript - 未捕获的错误:OpenTok 注释小部件需要 OpenTok 解决方案