python - ValueError:操作数无法与形状一起广播 - Keras
问题描述
我正在使用以(状态、动作、奖励、next_state)元组形式提供的另一个代理的演示来训练一个代理。我正在使用 Keras 和 Sklearn。
这就是 q 学习的工作原理:
def q_learning_model():
NUM_STATES = len(states)
NUM_ACTIONS = 4
GAMMA = 0.99
model_in = tf.keras.layers.Input(shape=(1,), dtype=tf.int32)
tmp = tf.one_hot(model_in, NUM_STATES)
tmp = tf.keras.layers.Dense(NUM_ACTIONS, use_bias=False)(tmp)
model_out = tf.squeeze(tmp, axis=1)
q_function = tf.keras.Model(model_in, model_out)
state = tf.keras.layers.Input(shape=(1,), dtype=tf.int32, name="State")
action = tf.keras.layers.Input(shape=(1,), dtype=tf.int32, name="Action")
reward = tf.keras.layers.Input(shape=(1,), name="Reward")
next_state = tf.keras.layers.Input(shape=(1,), dtype=tf.int32, name="Next_State")
td_target = reward + GAMMA * tf.reduce_max(q_function(next_state), axis=-1)
predictions = tf.gather(q_function(state), action, axis=-1)
train_model = tf.keras.Model(
inputs=[state, action, reward, next_state],
outputs=[predictions, td_target]
)
# to date it still feels as if tf.stop_gradient is a horrible
# hack similar to DDQL to stabelize the algorithm
td_error = 0.5 * tf.abs(tf.stop_gradient(td_target) - predictions) ** 2
train_model.add_loss(td_error, [state, action, reward, next_state])
predicted_action = tf.argmax(q_function(state), axis=-1)
correct_predictions = tf.keras.metrics.categorical_accuracy(
action, predicted_action)
train_model.add_metric(correct_predictions,
name="Matched_Actions", aggregation="mean")
return q_function, train_model
在主函数中,我调用了一个外部数据文件,如下所示:
states, actions, rewards, next_states = load_data("data.csv")
indices = np.arange(len(states))
我训练我的代理:
q_scores = list()
policy_scores = list()
for train_idx, test_idx in KFold(shuffle=True).split(indices):
train_data = [
states[train_idx, ...],
actions[train_idx, ...],
rewards[train_idx, ...],
next_states[train_idx, ...],
]
test_data = [
states[test_idx, ...],
actions[test_idx, ...],
rewards[test_idx, ...],
next_states[test_idx, ...],
]
q_function, train_q = q_learning_model()
del q_function
train_q.compile(optimizer="sgd", experimental_run_tf_function=False)
train_q.fit(train_data)
_, score = train_q.evaluate(test_data)
q_scores.append(score)
policy_fn, train_policy = q_learning_model()
del policy_fn
train_policy.compile(optimizer="sgd", experimental_run_tf_function=False)
train_policy.fit(train_data)
_, score = train_policy.evaluate(test_data)
policy_scores.append(score)
一切似乎都有效,但我收到以下错误:
self.results[0] += batch_outs[0] * (batch_end - batch_start)
ValueError: operands could not be broadcast together with shapes (32,32,32) (3,3,3) (32,32,32)
即使我的 train_data 形状(用于状态、动作、奖励、下一个状态)如下:
train_data[0].shape -> (1123,)
train_data[1].shape -> (1123,)
train_data[2].shape -> (1123,)
train_data[3].shape -> (1123,)
让我知道您是否遇到过类似的问题以及您是如何解决的。如果您发现代码中存在其他错误,请随时回复。
感谢您的时间和支持
解决方案
推荐阅读
- delphi - 如何强制 Delphi 的 TADOConnection 使用 ansistring 作为命令?
- paperjs - 调整大小在 paperjs 0.11.8 中不起作用,但适用于 0.9.25
- javascript - 用户通过 chrome 扩展加载时如何隐藏 gmail 收件箱菜单
- java - SimpleFileVisitor 遍历目录树以查找除两个子目录之外的所有 .txt 文件
- python-3.x - seaborn 库没有更新条形图?
- python - 如何创建当前导入模块的列表?
- android - Android/Kotlin - 创建两个按钮并应用约束
- angular - 由于某种原因没有出现 Angular 对话框窗口
- dart - 我们在flutter的dev_dependencies下添加的
- c# - 我们如何在 foreach 循环中将行分成 3 列