首页 > 解决方案 > Pytorch:反向传播多个损失

问题描述

我想反向传播多个样本。这意味着 PyTorch 的损失不止一次。我想在特定的时间戳做到这一点。我正在尝试这样做:

        losso = 0
        for g, logprob in zip(G, self.action_memory):
            losso += -g * logprob
        self.buffer.append(losso)

        if (self.game_counter > self.pre_training_games):
            for element in self.buffer:
                self.policy.optimizer.zero_grad()
                element.backward(retain_graph=True)
                self.policy.optimizer.step()

但我得到了一个运行时错误:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [91, 9]], which is output 0 of TBackward, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

标签: pythonneural-networkpytorchreinforcement-learning

解决方案


似乎您正在重用loss
一方面,您每次迭代的损失添加到loss
而另一方面,您backward()通过每次迭代的损失loss

这可能是您的错误的原因。


推荐阅读