首页 > 解决方案 > MSELoss 给出不匹配的批量尺寸,批量大小>1

问题描述

相关代码概述:


def train_net(net,
              device,
              epochs=5,
              batch_size=10,
              lr=0.1,
              val_percent=0.1,
              save_cp=True,
              img_scale=0.5,
              n_channels=5,
              n_classes=1):

    dataset = BasicDataset(dir_img, dir_gt, img_scale, n_channels)
    n_val = int(len(dataset) * val_percent)
    n_train = len(dataset) - n_val
    train, val = random_split(dataset, [n_train, n_val])
    train_loader = DataLoader(train, batch_size=batch_size, shuffle=True, num_workers=8, pin_memory=True)
    val_loader = DataLoader(val, batch_size=batch_size, shuffle=False, num_workers=8, pin_memory=True)

    writer = SummaryWriter(comment=f'LR_{lr}_BS_{batch_size}_SCALE_{img_scale}')    #Add folder location
    global_step = 0

    optimizer = optim.Adam(net.parameters(), lr=lr)
    
    criterion = nn.MSELoss()

    for epoch in range(epochs):
        net.train()

        epoch_loss = 0
        with tqdm(total=n_train, desc=f'Epoch {epoch + 1}/{epochs}', unit='img') as pbar:
            for batch in train_loader:
                imgs = batch['image']
                true_masks = batch['mask']
                assert imgs.shape[1] == net.n_channels, \
                    f'Network has been defined with {net.n_channels} input channels, ' \
                    f'but loaded images have {imgs.shape[1]} channels. Please check that ' \
                    'the images are loaded correctly.'

                imgs = imgs.to(device=device, dtype=torch.float32)
                mask_type = torch.float32 if net.n_classes == 1 else torch.long
                true_masks = true_masks.to(device=device, dtype=mask_type)

                masks_pred = net(imgs)

                loss = criterion(masks_pred, true_masks)     
                
                epoch_loss += loss.item()
                writer.add_scalar('Loss/train', loss.item(), global_step)

                pbar.set_postfix(**{'loss (batch)': loss.item()})

                optimizer.zero_grad()
                loss.backward()
                nn.utils.clip_grad_value_(net.parameters(), 0.1)
                optimizer.step()

                pbar.update(imgs.shape[0])
                global_step += 1

当我选择损失函数的批量大小时,1不会发出警告。当我将其更改为 时10,它抱怨说:

:\workdir\Programs\Anaconda3\envs\XXXX\lib\site-packages\torch\nn\modules\loss.py:446: UserWarning: Using a target size **(torch.Size([1, 1, 89, 99]))** that is different to 
the input size **(torch.Size([10, 1, 89, 99]))**. This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.

                                                                                                                                                                                      C:\workdir\Programs\Anaconda3\envs\XXXX\lib\site-packages\torch\nn\modules\loss.py:446: UserWarning: Using a target size **(torch.Size([1, 1, 89, 99]))** that is different to 
the input size **(torch.Size([4, 1, 89, 99]))**. This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
  return F.mse_loss(input, target, reduction=self.reduction)

但是,当我调试时,输入和目标的形状都是[10,1,89,99]. 我认为这与训练规模和验证规模有关......

Training size:   579
Validation size: 64

有谁知道如何解决这个问题,所以我也可以处理更大的批量,大于1

标签: pytorchloss-function

解决方案


推荐阅读