python - 权重和偏差扫描无法使用 pytorch 闪电导入模块
问题描述
我正在使用 pytorch-lightning 训练变分自动编码器。我的 pytorch-lightning 代码与权重和偏差记录器一起使用。我正在尝试使用 W&B 参数扫描进行参数扫描。
超参数搜索过程基于我从这个 repo 中遵循的内容。
运行正确初始化,但是当我的训练脚本使用第一组超参数运行时,我收到以下错误:
2020-08-14 14:09:07,109 - wandb.wandb_agent - INFO - About to run command: /usr/bin/env python train_sweep.py --LR=0.02537477586974176
Traceback (most recent call last):
File "train_sweep.py", line 1, in <module>
import yaml
ImportError: No module named yaml
yaml
已安装并且工作正常。我可以通过手动设置参数来训练网络,但不能使用参数扫描。
这是我训练 VAE 的扫描脚本:
import yaml
import numpy as np
import ipdb
import torch
from vae_experiment import VAEXperiment
import torch.backends.cudnn as cudnn
from pytorch_lightning import Trainer
from pytorch_lightning.loggers import WandbLogger
from pytorch_lightning.callbacks import EarlyStopping
from vae_network import VanillaVAE
import os
import wandb
from utils import get_config, log_to_wandb
# Sweep parameters
hyperparameter_defaults = dict(
root='data_semantics',
gpus=1,
batch_size = 2,
lr = 1e-3,
num_layers = 5,
features_start = 64,
bilinear = False,
grad_batches = 1,
epochs = 20
)
wandb.init(config=hyperparameter_defaults)
config = wandb.config
def main(hparams):
model = VanillaVAE(hparams['exp_params']['img_size'], **hparams['model_params'])
model.build_layers()
experiment = VAEXperiment(model, hparams['exp_params'], hparams['parameters'])
logger = WandbLogger(
project='vae',
name=config['logging_params']['name'],
version=config['logging_params']['version'],
save_dir=config['logging_params']['save_dir']
)
wandb_logger.watch(model.net)
early_stopping = EarlyStopping(
monitor='val_loss',
min_delta=0.00,
patience=3,
verbose=False,
mode='min'
)
runner = Trainer(weights_save_path="../../Logs/",
min_epochs=1,
logger=logger,
log_save_interval=10,
train_percent_check=1.,
val_percent_check=1.,
num_sanity_val_steps=5,
early_stop_callback = early_stopping,
**config['trainer_params']
)
runner.fit(experiment)
if __name__ == '__main__':
main(config)
为什么我会收到此错误?
解决方案
问题是我的代码结构和运行 wandb 命令的方式不正确。查看这个 pytorch-ligthning是要遵循wandb
的正确结构。
这是我重构的代码:
#!/usr/bin/env python
import wandb
from utils import get_config
#---------------------------------------------------------------------------------------------
def main():
"""
The training function used in each sweep of the model.
For every sweep, this function will be executed as if it is a script on its own.
"""
import wandb
import yaml
import numpy as np
import torch
from vae_experiment import VAEXperiment
import torch.backends.cudnn as cudnn
from pytorch_lightning import Trainer
from pytorch_lightning.loggers import WandbLogger
from pytorch_lightning.callbacks import EarlyStopping
from vae_network import VanillaVAE
import os
from utils import log_to_wandb, format_config
path_to_config = 'sweep.yaml'
config = get_config(path_to_yaml)
path_to_defaults = 'defaults.yaml'
param_defaults = get_config(path_to_defaults)
wandb.init(config=param_defaults)
config = format_config(config, wandb.config)
model = VanillaVAE(config['meta']['img_size'], hidden_dims = config['hidden_dims'], latent_dim = config['latent_dim'])
model.build_layers()
experiment = VAEXperiment(model, config)
early_stopping = EarlyStopping(
monitor='val_loss',
min_delta=0.00,
patience=3,
verbose=False,
mode='max'
)
runner = Trainer(weights_save_path=config['meta']['save_dir'],
min_epochs=1,
train_percent_check=1.,
val_percent_check=1.,
num_sanity_val_steps=5,
early_stop_callback = early_stopping,
**config['trainer_params'])
runner.fit(experiment)
log_to_wandb(config, runner, experiment, path_to_config)
#---------------------------------------------------------------------------------------------
path_to_yaml = 'sweep.yaml'
sweep_config = get_config(path_to_yaml)
sweep_id = wandb.sweep(sweep_config)
wandb.agent(sweep_id, function=main)
#---------------------------------------------------------------------------------------------
推荐阅读
- python - Pandas:在循环中构建新数据框时出现“返回视图与副本”警告
- python - 解码在python 3中将行分解为字符
- linux - CLI 中的 Chrome 无头问题:空 PDF 和错误“无法序列化文档:未捕获”
- sql - 从 BigQuery 地理点获取纬度/经度
- html - 有没有办法将最后一个 div 放在第一个上而不重叠 div
- nginx - 简单的 nginx 配置在 CentOS 上不起作用
- sql - SQL Oracle 中的字符转十进制/双精度
- reactjs - 如何在静态服务器上部署 React 应用程序
- jquery - 当一个元素有 onclick 处理程序时,不会触发引导折叠
- javascript - 是否可以根据条件对齐 div?