python - PPOAgent + Cartpole = ValueError:actor_network 输出规范与操作规范不匹配:
问题描述
我正在尝试在 CartPole-v1 环境中使用 tf_agents 的 PPOAgent,但在声明代理本身时收到以下错误:
ValueError: actor_network output spec does not match action spec:
TensorSpec(shape=(2,), dtype=tf.float32, name=None)
vs.
BoundedTensorSpec(shape=(), dtype=tf.int64, name='action', minimum=array(0, dtype=int64), maximum=array(1, dtype=int64))
我相信问题是我的网络的输出tf.float32
不是tf.int64
,但我可能是错的。我不知道如何使网络输出一个整数,据我所知,这是不可能或不希望的。
如果我运行像 MountainCarContinuous-v0 这样的连续环境,我会得到一个不同的错误:
ValueError: Unexpected output from `actor_network`. Expected `Distribution` objects, but saw output spec: TensorSpec(shape=(1,), dtype=tf.float32, name=None)
以下是相关代码(主要取自 DQN 教程):
# env_name = 'MountainCarContinuous-v0'
env_name = 'CartPole-v1'
train_py_env = suite_gym.load(env_name)
eval_py_env = suite_gym.load(env_name)
train_env = tf_py_environment.TFPyEnvironment(train_py_env)
eval_env = tf_py_environment.TFPyEnvironment(eval_py_env)
train_env.reset()
eval_env.reset()
actor_layer_params = (100, 50)
critic_layer_params = (100, 50)
action_tensor_spec = tensor_spec.from_spec(train_env.action_spec())
num_actions = action_tensor_spec.maximum - action_tensor_spec.minimum + 1
# Define a helper function to create Dense layers configured with the right
# activation and kernel initializer.
def dense_layer(num_units):
return tf.keras.layers.Dense(
num_units,
activation=tf.keras.activations.relu,
kernel_initializer=tf.keras.initializers.VarianceScaling(
scale=2.0, mode='fan_in', distribution='truncated_normal'))
#Actor network
dense_layers = [dense_layer(num_units) for num_units in actor_layer_params]
actions_layer = tf.keras.layers.Dense(
1,
name='actions',
activation=None,
kernel_initializer=tf.keras.initializers.RandomUniform(
minval=-0.03, maxval=0.03),
bias_initializer=tf.keras.initializers.Constant(-0.2))
ActorNet = sequential.Sequential(dense_layers + [actions_layer])
#Critic/value network
dense_layers = [dense_layer(num_units) for num_units in critic_layer_params]
criticism_layer = tf.keras.layers.Dense(
1,
activation=None,
kernel_initializer=tf.keras.initializers.RandomUniform(
minval=-0.03, maxval=0.03),
bias_initializer=tf.keras.initializers.Constant(-0.2))
CriticNet = sequential.Sequential(dense_layers + [criticism_layer])
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)
train_step_counter = tf.Variable(0)
#Error occurs here
agent = tf_agents.agents.PPOAgent(
train_env.time_step_spec(),
train_env.action_spec(),
optimizer=optimizer,
actor_net=ActorNet,
value_net=CriticNet,
train_step_counter=train_step_counter)
我觉得我必须遗漏一些明显的东西,或者有一个根本的误解,任何和所有的帮助都将不胜感激。我找不到使用中的 PPOAgent 的示例。
解决方案
想通了,我需要使用一个返回分布的网络,例如 ActorDistributionNetwork
推荐阅读
- java - TableLayout 在调整文本大小时调整视图大小,即使它是固定大小的
- c# - 当我修改DataGridView时,它给system.invalidoperationexception跨线程操作无效?
- c - 如何使用内联汇编将给定地址写入寄存器
- laravel-5 - 如何在需要时禁用或设置为空数组模型属性 $with 以不急于加载默认模型关系?
- hyperledger - What is the block time of Hyperledger Sawtooth PoET?
- android - 三星 Galaxy:正在运行的 Android 应用程序从后台堆栈列表中消失
- android - Android 中的 scrollToEnd - GraphView
- android - 附近的 AR POI 重叠
- javascript - 元视口根本不适用于我的网站
- java - 无法创建 SOAP 连接工厂:未找到提供程序 com.sun.xml.internal.messaging.saaj.client.p2p.HttpSOAPConnectionFactory