首页 > 解决方案 > 验证损失曲线 PyTorch - 如何在训练时存储所有损失,不仅持续?

问题描述

我在这个领域很新。原始代码来自这个 GitHub:https ://github.com/YuliaRubanova/latent_ode (run_models.py) 文件。我想使用验证损失绘制学习曲线。训练的代码部分:

# Training

log_path = "logs/" + file_name + "_" + str(experimentID) + ".log"
if not os.path.exists("logs/"):
    utils.makedirs("logs/")
logger = utils.get_logger(logpath=log_path, filepath=os.path.abspath(__file__))
logger.info(input_command)

optimizer = optim.Adamax(model.parameters(), lr=args.lr)

num_batches = data_obj["n_train_batches"]

for itr in range(1, num_batches * (args.niters + 1)):
    optimizer.zero_grad()
    utils.update_learning_rate(optimizer, decay_rate = 0.999, lowest = args.lr / 10)

    wait_until_kl_inc = 10
    if itr // num_batches < wait_until_kl_inc:
        kl_coef = 0.
    else:
        kl_coef = (1-0.99** (itr // num_batches - wait_until_kl_inc))

    batch_dict = utils.get_next_batch(data_obj["train_dataloader"])
    train_res = model.compute_all_losses(batch_dict, n_traj_samples = 3, kl_coef = kl_coef)
    train_res["loss"].backward()
    optimizer.step()

    n_iters_to_viz = 1
    if itr % (n_iters_to_viz * num_batches) == 0:
        with torch.no_grad():

            test_res = compute_loss_all_batches(model, 
                data_obj["test_dataloader"], args,
                n_batches = data_obj["n_test_batches"],
                experimentID = experimentID,
                device = device,
                n_traj_samples = 3, kl_coef = kl_coef)

            message = 'Epoch {:04d} [Test seq (cond on sampled tp)] | Loss {:.6f} | Likelihood {:.6f} | KL fp {:.4f} | FP STD {:.4f}|'.format(
                itr//num_batches, 
                test_res["loss"].detach(), test_res["likelihood"].detach(), 
                test_res["kl_first_p"], test_res["std_first_p"])
        
            logger.info("Experiment " + str(experimentID))
            logger.info(message)
            logger.info("KL coef: {}".format(kl_coef))
            logger.info("Train loss (one batch): {}".format(train_res["loss"].detach()))
            logger.info("Train CE loss (one batch): {}".format(train_res["ce_loss"].detach()))
            
            if "auc" in test_res:
                logger.info("Classification AUC (TEST): {:.4f}".format(test_res["auc"]))

            if "mse" in test_res:
                logger.info("Test MSE: {:.4f}".format(test_res["mse"]))

            if "accuracy" in train_res:
                logger.info("Classification accuracy (TRAIN): {:.4f}".format(train_res["accuracy"]))

            if "accuracy" in test_res:
                logger.info("Classification accuracy (TEST): {:.4f}".format(test_res["accuracy"]))

            if "pois_likelihood" in test_res:
                logger.info("Poisson likelihood: {}".format(test_res["pois_likelihood"]))

            if "ce_loss" in test_res:
                logger.info("CE loss: {}".format(test_res["ce_loss"]))

        torch.save({
            'args': args,
            'state_dict': model.state_dict(),
        }, ckpt_path)

我还添加了这部分来绘制损失曲线:

import matplotlib.pyplot as plt

import seaborn as sns

# Use plot styling from seaborn.
sns.set(style='darkgrid')

      # Increase the plot size and font size.
sns.set(font_scale=1.5)
plt.rcParams["figure.figsize"] = (12,6)

      # Plot the learning curve.
plt.plot(test_res["mse"], 'r-o')
      #plt.plot(tsloss_val2, 'c-o')

plt.legend(["RNN"])
      # Label the plot.
plt.title("Validation loss")
plt.xlabel("Epoch")
plt.ylabel("Loss") 

plt.savefig('/content/latent_ode/foo.png')

我得到的问题只是最终损失的曲线,而不是所有迭代。我的曲线: 在此处输入图像描述

应该在代码中添加什么来获得在所有时期都有损失的标准学习曲线,而不仅仅是从最后一个时期开始?像这样(来自其他项目): 在此处输入图像描述

标签: pythonplotdeep-learningneural-networkpytorch

解决方案


将损失保存在列表中(在 epochs 的范围之前定义一个列表,并将每个 loss.item 附加到列表中)。之后,绘制情节


推荐阅读