首页 > 解决方案 > 计算策略梯度的损失

问题描述

我正在尝试使用 SeqGAN 的策略梯度来实现损失。如果我们在函数中执行此操作,我知道参数将是inputtargetrewards。这些是参数的尺寸:

   def batchPGLoss(self, inp: torch.Tensor, target: torch.Tensor, reward: List) -> torch.Tensor:
        batch_size, seq_len = inp.size()
        # swap the dimensions
        inp = inp.permute(1, 0)  # (seq_len x batch_size)
        # swap the dimensions
        target = target.permute(1, 0)  # (seq_len x batch_size)
        # init hidden state
        h = self.init_hidden(batch_size)

        loss = torch.zeros(1)
        for i in range(seq_len):
            # pass the first tokens for each elem in batch
            out, h = self.forward(inp[i], h)
            for j in range(batch_size):
                loss += -out[j][target.data[i][j]] * reward[j] # log(P(y_t | Y_1, ..., Y_{t-1})) * Q      
        return loss / batch_size

当我运行它时,我得到以下问题

RuntimeError: Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead.

我检查了哪个部分给出了错误,我发现这-out[j][target.data[i][j]]是问题所在,因为它需要渐变。我可以使用tensor.detach().numpy(),但这并不能解决问题,因为我仍然需要保持out在图表上

有什么建议么?

标签: pythondeep-learningnlppytorchgenerative-adversarial-network

解决方案


推荐阅读