deep-learning - 查找 DistributedDataParallel 的节点数和 GPU
问题描述
我想知道我应该为节点和 gpus 选择什么数字。
我使用 Tesla V100-SXM2(8 个板)。
我试过:
nodes = 1,gpus=1(只有第一个gpu工作)
nodes=1,gpus =8(花了很长时间,无法执行)
节点和 GPU 的参数是否错误?还是我的代码错了?如果您能帮助我,我将不胜感激。下面的代码是 DPP 的简化示例代码。
def main():
parser = argparse.ArgumentParser()
parser.add_argument('-n', '--nodes', default=1, type=int, metavar='N')
parser.add_argument('-g', '--gpus', default=1, type=int,
help='number of gpus per node')
parser.add_argument('-nr', '--nr', default=0, type=int,
help='ranking within the nodes')
parser.add_argument('--epochs', default=200, type=int, metavar='N',
help='number of total epochs to run')
args = parser.parse_args()
args.world_size = args.gpus * args.nodes
os.environ['MASTER_ADDR'] = 'host1'
os.environ['MASTER_PORT'] = '7777'
mp.spawn(train, nprocs=args.gpus, args=(args,))
def train(gpu, args):
rank = args.nr * args.gpus + gpu
dist.init_process_group(
backend='nccl',
init_method='env://',
world_size=args.world_size,
rank=rank
)
torch.manual_seed(0)
model = ConvNet()
torch.cuda.set_device(gpu)
model.cuda(gpu)
batch_size = 100
# define loss function (criterion) and optimizer
criterion = nn.CrossEntropyLoss().cuda(gpu)
optimizer = torch.optim.SGD(model.parameters(), 1e-4)
# Wrapper around our model to handle parallel training
model = nn.parallel.DistributedDataParallel(model, device_ids=[gpu])
# Data loading code
train_dataset = get_datasets()
# Sampler that takes care of the distribution of the batches such that
# the data is not repeated in the iteration and sampled accordingly
train_sampler = torch.utils.data.distributed.DistributedSampler(
train_dataset,
num_replicas=args.world_size,
rank=rank
)
# We pass in the train_sampler which can be used by the DataLoader
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
batch_size=batch_size,
shuffle=False,
num_workers=0,
pin_memory=True,
sampler=train_sampler)
start = datetime.now()
total_step = len(train_loader)
for epoch in range(args.epochs):
for i, (images, labels) in enumerate(train_loader):
images = images.cuda(non_blocking=True)
labels = labels.cuda(non_blocking=True)
# Forward pass
outputs = model(images)
loss = criterion(outputs, labels)
# Backward and optimize
optimizer.zero_grad()
loss.backward()
optimizer.step()
if (i + 1) % 100 == 0 and gpu == 0:
print('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}'.format(
epoch + 1,
args.epochs,
i + 1,
total_step,
loss.item())
)
if gpu == 0:
print("Training complete)
解决方案
推荐阅读
- eclipse - 附加 Eclipse 插件的源代码
- asp.net-mvc - MVC 中用于 http 请求的自定义错误页面
- php - 如何让紧凑型忽略不存在的变量?
- css - 在 Reactjs 中激活 Nav-Item
- python - 你真的可以在 python 中声明变量的数据类型吗?
- python - 自定义 JSON 编码器无法扩展
- vb.net - RichTextBox SelectionColor 在我更改字体大小时更改
- python-3.x - python 可执行文件出错:无法从 tkinter 导入名称 ttk
- qt - 如何避免 QtQuick.Controls Button.qml 错误的父级用于 transitionDuration(第 77 行)
- python - 使用 Python Great Expectations 删除无效数据