pytorch - 为什么使用 torch.cuda.amp.GradScaler 时比例变为零？

在使用 Pytorch 的自动混合精度包 ( amp )时，我使用以下代码片段来显示比例：

scaler = torch.cuda.amp.GradScaler(init_scale = 65536.0,growth_interval=1)
print(scaler.get_scale())

这是我得到的输出：

...
65536.0
32768.0
16384.0
8192.0
4096.0
...
1e-xxx
...
0
0
0

而这一步之后的所有损失都变成Nan了（同时规模仍然为0）。
我的损失函数或训练数据有什么问题？

标签： pytorchlossautomatic-mixed-precision