首页 > 解决方案 > 训练 FFNN pyTorch 时内存不断上升

问题描述

我有一个前馈神经网络,它对 MNIST 数据集进行分类。出于某种原因,无论批处理大小有多大,内存都会持续接近 99%。我没有任何大小增加的东西 - 每个动态变量在第一个时期之后都会被覆盖,但即使在时期号 70 之后,内存也会不断上升。

我在 8GB 内存、2.8 ghZ intel i-5(第 7 代)四核(Ubuntu 18.04)上运行它

batch_size =50 #Number of x's we pass through the net at each iteration
num_epochs =100 #Number of times the entire training data is through the    net

n_train = len(xtrain)
n_batch_train = n_train // batch_size
n_val =len(xval)
n_batch_val = n_val // batch_size

# loss/acc
train_acc,train_loss =[],[]
val_acc,val_loss =[],[]

test_acc,test_loss =[],[]


#Get parameters from the net

par=[]

for i in range(len(layers)-1):
    par=par+list(net.L[i].parameters())



#Optimizer

optimizer = optim.Adam(par,lr=0.001)

#interval of x

get_slice = lambda i,size: range(i*size,(i+1)*size)


for e in range(num_epochs):
  curr_loss =0
  net.train()
  for i in range(n_batch_train):

    x_interval = get_slice(i,batch_size)
    slze = get_slice(i,batch_size)
    #Batchnorm
    bn = nn.BatchNorm1d(num_features = num_features)
    x_batch = bn(Variable(torch.from_numpy(xtrain[slze])))

    out = (net(x_batch)).double()
    target_batch = Variable(torch.from_numpy(ytrain[slze]).double())
    L=criterion(out,target_batch)

    #Update gradients
    optimizer.zero_grad()
    L.backward()
    optimizer.step()

    #Store training accuracy and loss
    train_acc.append(accuracy(target_batch, out).data)
    train_loss.append(L.data.numpy())

#### Validate ####
  net.eval()
  for j in range(n_batch_val):
    slze = get_slice(j,batch_size)
    val_batch = Variable(torch.from_numpy(xval[slze]))
    val_out = (net(bn(val_batch))).double()
    target_batch = Variable(torch.from_numpy(yval[slze]).double())

    #Store val acc and loss
    val_acc.append(accuracy(target_batch,val_out).data)

标签: pythonmemoryneural-networkpytorch

解决方案


推荐阅读