pytorch - 为什么在定义 ReLU autograd 函数时需要克隆 grad_output 并将其分配给 grad

问题描述

我正在浏览 pytorch 教程的 autograd 部分。我有两个问题：

为什么我们需要克隆grad_output并在反向传播期间将其分配给grad_input其他简单的分配？
的目的是grad_input[input < 0] = 0什么？这是否意味着当输入小于零时我们不更新梯度？

这是代码：

class MyReLU(torch.autograd.Function):
    @staticmethod
    def forward(ctx, input):
        """
        In the forward pass we receive a Tensor containing the input and return
        a Tensor containing the output. ctx is a context object that can be used
        to stash information for backward computation. You can cache arbitrary
        objects for use in the backward pass using the ctx.save_for_backward method.
        """
        ctx.save_for_backward(input)
        return input.clamp(min=0)

    @staticmethod
    def backward(ctx, grad_output):
        """
        In the backward pass we receive a Tensor containing the gradient of the loss
        with respect to the output, and we need to compute the gradient of the loss
        with respect to the input.
        """
        input, = ctx.saved_tensors
        grad_input = grad_output.clone()
        grad_input[input < 0] = 0
        return grad_input

链接在这里： https ://pytorch.org/tutorials/beginner/pytorch_with_examples.html#pytorch-defining-new-autograd-functions

提前非常感谢。

标签： pytorchbackpropagationautograd

为什么我们需要克隆 grad_output 并将其分配给 grad_input 而不是在反向传播期间进行简单分配？

tensor.clone()创建一个模仿原始张量requires_grad场的张量副本。clone是一种复制张量的方法，同时仍将副本保留为它来自的计算图的一部分。

因此，grad_input是与相同的计算图的一部分，grad_output如果我们计算的梯度grad_output，那么也会对进行相同的操作grad_input。

由于我们在中进行了更改grad_input，因此我们首先将其克隆。

'grad_input [input < 0] = 0'的目的是什么？这是否意味着当输入小于零时我们不更新梯度？

这是根据 ReLU 函数的定义完成的。ReLU 函数是f(x)=max(0,x). 这意味着如果x<=0那么f(x)=0，否则f(x)=x。在第一种情况下，当时，关于x<0的导数是。所以，我们执行. 在第二种情况下，它是，所以我们只需传递to （就像一扇敞开的门）。f(x)xf'(x)=0grad_input[input < 0] = 0f'(x)=1grad_outputgrad_input

pytorch - 为什么在定义 ReLU autograd 函数时需要克隆 grad_output 并将其分配给 grad_input？

问题描述

解决方案

推荐阅读