首页 > 解决方案 > ValueError:操作数无法与形状一起广播 - Keras

问题描述

我正在使用以(状态、动作、奖励、next_state)元组形式提供的另一个代理的演示来训练一个代理。我正在使用 Keras 和 Sklearn。

这就是 q 学习的工作原理:

def q_learning_model():
NUM_STATES = len(states)
NUM_ACTIONS = 4
GAMMA = 0.99

model_in = tf.keras.layers.Input(shape=(1,), dtype=tf.int32)
tmp = tf.one_hot(model_in, NUM_STATES)
tmp = tf.keras.layers.Dense(NUM_ACTIONS, use_bias=False)(tmp)
model_out = tf.squeeze(tmp, axis=1)
q_function = tf.keras.Model(model_in, model_out)

state = tf.keras.layers.Input(shape=(1,), dtype=tf.int32, name="State")
action = tf.keras.layers.Input(shape=(1,), dtype=tf.int32, name="Action")
reward = tf.keras.layers.Input(shape=(1,), name="Reward")
next_state = tf.keras.layers.Input(shape=(1,), dtype=tf.int32, name="Next_State")

td_target = reward + GAMMA * tf.reduce_max(q_function(next_state), axis=-1)
predictions = tf.gather(q_function(state), action, axis=-1)
train_model = tf.keras.Model(
    inputs=[state, action, reward, next_state],
    outputs=[predictions, td_target]
)

# to date it still feels as if tf.stop_gradient is a horrible
# hack similar to DDQL to stabelize the algorithm
td_error = 0.5 * tf.abs(tf.stop_gradient(td_target) - predictions) ** 2
train_model.add_loss(td_error, [state, action, reward, next_state])

predicted_action = tf.argmax(q_function(state), axis=-1)
correct_predictions = tf.keras.metrics.categorical_accuracy(
    action, predicted_action)
train_model.add_metric(correct_predictions,
                       name="Matched_Actions", aggregation="mean")

return q_function, train_model

在主函数中,我调用了一个外部数据文件,如下所示:

states, actions, rewards, next_states = load_data("data.csv")
indices = np.arange(len(states))

我训练我的代理:

q_scores = list()
policy_scores = list()
for train_idx, test_idx in KFold(shuffle=True).split(indices):
    train_data = [
        states[train_idx, ...],
        actions[train_idx, ...],
        rewards[train_idx, ...],
        next_states[train_idx, ...],
    ]
    test_data = [
        states[test_idx, ...],
        actions[test_idx, ...],
        rewards[test_idx, ...],
        next_states[test_idx, ...],
    ]
    
    q_function, train_q = q_learning_model()
    del q_function
    train_q.compile(optimizer="sgd", experimental_run_tf_function=False)
    train_q.fit(train_data)

    _, score = train_q.evaluate(test_data)
    q_scores.append(score)

    policy_fn, train_policy = q_learning_model()
    del policy_fn
    train_policy.compile(optimizer="sgd", experimental_run_tf_function=False)
    train_policy.fit(train_data)
    _, score = train_policy.evaluate(test_data)
    policy_scores.append(score)

一切似乎都有效,但我收到以下错误:

self.results[0] += batch_outs[0] * (batch_end - batch_start)
ValueError: operands could not be broadcast together with shapes (32,32,32) (3,3,3) (32,32,32)

即使我的 train_data 形状(用于状态、动作、奖励、下一个状态)如下:

train_data[0].shape -> (1123,)
train_data[1].shape -> (1123,)
train_data[2].shape -> (1123,)
train_data[3].shape -> (1123,)

让我知道您是否遇到过类似的问题以及您是如何解决的。如果您发现代码中存在其他错误,请随时回复。

感谢您的时间和支持

标签: pythonmachine-learningkerasvalueerrorq-learning

解决方案


推荐阅读