首页 > 解决方案 > 在 Cartpole 示例中可以使用 SAC 代替 PPO 吗?

问题描述

我正在使用示例代码研究 AzureML RL。

我可以运行在计算实例上训练 PPO 模型的 cartpole 示例 (cartpole_ci.ipynb)。

我通过将 training_algorithm = "PPO" 更改为 training_algorithm = "SAC" 尝试了 SAC 而不是 PPO,但它失败并显示以下消息。

ray.rllib.utils.error.UnsupportedSpaceException:SAC 不支持操作空间 Discrete(2)。

有人在 AzureML RL 上尝试过 SAC 算法并且它有效吗?

标签: azure-machine-learning-service

解决方案


AzureML RL 确实支持 SAC 离散操作,但不支持参数化,我已经在文档中确认了这一点 - https://docs.ray.io/en/latest/rllib-algorithms.html#feature-compatibility-matrix

您是否遵循代码示例?

from azureml.contrib.train.rl import ReinforcementLearningEstimator, Ray

training_algorithm = "PPO" rl_environment = "CartPole-v0"

script_params = {

    # Training algorithm
    "--run": training_algorithm,
    
    # Training environment
    "--env": rl_environment,
    
    # Algorithm-specific parameters
    "--config": '\'{"num_gpus": 0, "num_workers": 1}\'',
    
    # Stop conditions
    "--stop": '\'{"episode_reward_mean": 200, "time_total_s": 300}\'',
    
    # Frequency of taking checkpoints
    "--checkpoint-freq": 2,
    
    # If a checkpoint should be taken at the end - optional argument with no value
    "--checkpoint-at-end": "",
    
    # Log directory
    "--local-dir": './logs' }

training_estimator = ReinforcementLearningEstimator(

    # Location of source files
    source_directory='files',
    
    # Python script file
    entry_script='cartpole_training.py',
    
    # A dictionary of arguments to pass to the training script specified in ``entry_script``
    script_params=script_params,
    
    # The Azure Machine Learning compute target set up for Ray head nodes
    compute_target=compute_target,
    
    # Reinforcement learning framework. Currently must be Ray.
    rl_framework=Ray() )

推荐阅读