首页 > 解决方案 > torch.mul 导致 param.grad 为 NoneType

问题描述

class Net(torch.nn.Module):
    def __init__(self, D_u, D_i, D_t, D_m):
        super(Net, self).__init__()
        self.lin_u = nn.Linear(D_u, 1)
        self.lin_i = nn.Linear(D_i, 1)
        self.lin_t = nn.Linear(D_t, 1)
        self.lin_m = nn.Linear(D_m, 1)
      
        self.output = nn.Linear(4, 1)

    def forward(self, args):
        (u, i, t, m) = args
        u = F.relu(self.lin_u(u))
        i = F.relu(self.lin_i(i))
        t = F.relu(self.lin_t(t))
        m = F.relu(self.lin_m(m))
        out = torch.mul(u, i)
        out = torch.mul(out, t)
        out = torch.mul(out, m)
        return out

我有这个简单的模型类,它有四个输入,每个输入都有自己的线性层。我希望输出是四个节点的乘积,但由于某种原因,无论我如何将它们相乘(使用 torch.mul 或 *),grad 始终是 Nonetype:

model = Net(N, 3, T, 1)
u_block, i_block, t_block, m_block, y_block = get_data_new(data)

loss_fn = torch.nn.MSELoss(reduction='sum')

learning_rate = 1e-4
for t in range(5000):
    y_pred = model((u_block, i_block, t_block, m_block))

    loss = loss_fn(y_pred, y_block)
    if t % 100 == 99:
        print(t, loss.item())

    model.zero_grad()

    loss.backward()

    with torch.no_grad():
        for param in model.parameters():
          param -= learning_rate * param.grad
TypeError                                 
--->   param -= learning_rate * param.grad

TypeError: unsupported operand type(s) for *: 'float' and 'NoneType'

我已将输入设置为 requires_grad=True,我认为问题在于 out 不是叶子,因此没有渐变,但我不知道如何解决这个问题。

编辑:

数据u_block、i_block、t_block、m_block、y_block如下所示。u_block、i_block 和 t_block 是 one-hot 向量。

u_block:  tensor([[1., 0., 0.,  ..., 0., 0., 0.],
        [1., 0., 0.,  ..., 0., 0., 0.],
        [1., 0., 0.,  ..., 0., 0., 0.],
        ...,
        [0., 0., 0.,  ..., 0., 0., 1.],
        [0., 0., 0.,  ..., 0., 0., 1.],
        [0., 0., 0.,  ..., 0., 0., 1.]], requires_grad=True)
i_block:  tensor([[1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        ...,
        [0., 1., 0.],
        [0., 1., 0.],
        [0., 1., 0.]], requires_grad=True)
t_block:  tensor([[1., 0., 0.,  ..., 0., 0., 0.],
        [0., 1., 0.,  ..., 0., 0., 0.],
        [0., 0., 1.,  ..., 0., 0., 0.],
        ...,
        [0., 0., 0.,  ..., 1., 0., 0.],
        [0., 0., 0.,  ..., 0., 1., 0.],
        [0., 0., 0.,  ..., 0., 0., 1.]], requires_grad=True)
m_block:  tensor([[ 0.0335],
        [ 0.0000],
        [ 0.0000],
        ...,
        [ 0.1515],
        [-0.2261],
        [-0.0402]], requires_grad=True)
y_block:  tensor([[ 0.0000],
        [ 0.0000],
        [ 0.0000],
        ...,
        [-0.2261],
        [-0.0402],
        [-0.1318]], requires_grad=True)```

标签: pytorch

解决方案


进行以下更改。你没有使用self.output所以我评论了。这使得渐变 None 因为您没有在前向传递中使用它,并且该层默认具有 requires_grad=True 。

class Net(torch.nn.Module):
    def __init__(self, D_u, D_i, D_t, D_m):
        super(Net, self).__init__()
        self.lin_u = nn.Linear(D_u, 1)
        self.lin_i = nn.Linear(D_i, 1)
        self.lin_t = nn.Linear(D_t, 1)
        self.lin_m = nn.Linear(D_m, 1)
      
        # self.output = nn.Linear(4, 1)

    def forward(self, args):
        (u, i, t, m) = args
        u = F.relu(self.lin_u(u))
        i = F.relu(self.lin_i(i))
        t = F.relu(self.lin_t(t))
        m = F.relu(self.lin_m(m))
        out = torch.mul(u, i)
        out = torch.mul(out, t)
        out = torch.mul(out, m)
        return out

我希望这能解决你的问题。

另外,我有一些建议,

  1. 将名称 args 更改为其他名称,或者如果您想使用它,然后通过更改为 *args 来充分利用它。
  2. 对于输入,不要放置 requires_grad 参数。因为它将计算 d_Loss/d_input。(仅当它不是您的意图时才这样做)

推荐阅读