python - 在张量流中运行会话时出现 InvalidArgumentError
问题描述
我使用这个a3c博客的帖子来学习代理。它使用神经网络来优化性能。但是当到达以下代码时,它会出错。事实上,它说存在不兼容的输入数据形状和占位符。但我尝试了很多不同的整形并考虑过重新整形。但是在运行 sess.run() 部分时仍然出现错误。我应该怎么做才能修复它?:
InvalidArgumentError: You must feed a value for placeholder tensor 'Placeholder_21' with dtype float and shape [?,2]
[[node Placeholder_21 (defined at <ipython-input-462-3c1b764fbd4e>:3) ]]
在批量打印输入数据时,我看到:
print("State shape:", batch_states.shape)
print("Batch states:",batch_states)
print("Batch actions length:",len(batch_actions))
print("Batch Actions:", batch_actions)
print("Batch Rewards:", batch_rewards)
print("Batch Done:", batch_done)
print("Num actions:", n_actions)
State shape: (10, 2)
Batch states: [[1501.87201108 1501.87201108]
[1462.65450863 1462.65450863]
[1480.95616876 1480.95616876]
[1492.24380743 1492.24380743]
[1481.92809598 1481.92809598]
[1480.19257102 1480.19257102]
[1503.54571786 1503.54571786]
[1489.38563414 1489.38563414]
[1541.16797527 1541.16797527]
[1516.04036259 1516.04036259]]
Batch actions length: 10
Batch Actions: [[1. 0. 1. 0. 0.]
[1. 0. 1. 0. 0.]
[1. 0. 1. 0. 0.]
[1. 0. 1. 0. 0.]
[1. 0. 1. 0. 0.]
[1. 0. 1. 0. 0.]
[1. 0. 1. 0. 0.]
[1. 0. 1. 0. 0.]
[1. 0. 1. 0. 0.]
[1. 0. 1. 0. 0.]]
Batch Rewards: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
Batch Done: [False False False False False False False False False False]
Num actions: 5
这是我从中得到错误的代码部分:
states_ph = tf.placeholder('float32', [None,] + list(obs_shape))
next_states_ph = tf.placeholder('float32', [None,] + list(obs_shape))
actions_ph = tf.placeholder('int32', (None,n_actions))
rewards_ph = tf.placeholder('float32', (None,))
is_done_ph = tf.placeholder('float32', (None,))
# logits[n_envs, n_actions] and state_values[n_envs, n_actions]
logits, state_values = agent.symbolic_step(states_ph)
next_logits, next_state_values = agent.symbolic_step(next_states_ph)
# There is no next state if the episode is done!
next_state_values = next_state_values * (1 - is_done_ph)
# probabilities and log-probabilities for all actions
probs = tf.nn.softmax(logits, axis=-1) # [n_envs, n_actions]
logprobs = tf.nn.log_softmax(logits, axis=-1) # [n_envs, n_actions]
# log-probabilities only for agent's chosen actions
logp_actions = tf.reduce_sum(logprobs * tf.one_hot(actions_ph, n_actions), axis=-1) # [n_envs,]
# Compute advantage using rewards_ph, state_values and next_state_values.
gamma = 0.99
advantage = rewards_ph + gamma * (next_state_values - state_values)
assert advantage.shape.ndims == 1, "please compute advantage for each sample, vector of shape [n_envs,]"
# Compute policy entropy given logits_seq. Mind the "-" sign!
entropy = - tf.reduce_sum(probs * logprobs, 1)
assert entropy.shape.ndims == 1, "please compute pointwise entropy vector of shape [n_envs,] "
# Compute target state values using temporal difference formula. Use rewards_ph and next_step_values
target_state_values = rewards_ph + gamma*next_state_values
actor_loss = -tf.reduce_mean(logp_actions * tf.stop_gradient(advantage), axis=0) - 0.001 * tf.reduce_mean(entropy, axis=0)
critic_loss = tf.reduce_mean((state_values - tf.stop_gradient(target_state_values))**2, axis=0)
train_step = tf.train.AdamOptimizer(1e-4).minimize(actor_loss + critic_loss)
sess.run(tf.global_variables_initializer())
l_act, l_crit, adv, ent = sess.run([actor_loss, critic_loss, advantage, entropy], feed_dict = {
states_ph: batch_states,
actions_ph: batch_actions,
next_states_ph: batch_states,
rewards_ph: batch_rewards,
is_done_ph: batch_done,
})
解决方案
推荐阅读
- python - Python:Numpy 将数组组合成 2x1 列表
- c# - 从 List 中读取数据
- emacs - web-mode:它可以在 JS 中的每一行包装注释上添加注释分隔符吗?
- php - 如何使用 PHP 将事件添加到 Office 365 中的另一个用户日历?
- oracle - 使用可变日期的 PL SQL 'IF 或 CASE'
- ios - 如何访问静态单元格中的 textLabel
- python - 文件随机打开不起作用
- javascript - 无故响应状态更新
- sql - Why is my 'select into a variable' statement making my stored procedure not insert anything into the table?
- windows - 关于执行后不关闭 cmd/powershell 窗口的 powershell vs cmd