首页 > 解决方案 > 来自 openai 健身房的 Bipedalwalker 使用 tensorflow 求解

问题描述

我正在尝试从 openai 解决 Bipedalwalker。问题是我总是得到错误:输出的形状应该是-1和1之间的4个值(如:[0.45099565 -0.7659952 -0.01972992 0.62626314])所以我定义了这样的模型:

def build_model(states, actions):
model = Sequential()
model.add(Flatten(input_shape=(1, states)))
model.add(Dense(24, activation='relu'))
model.add(Dense(24, activation='relu'))
model.add(Dense(actions, activation='linear'))
return model

状态 = 24,动作 = 4。当我尝试训练模型时,出现错误:IndexError:标量变量的索引无效。我认为这是因为模型的输出大于 1 或小于 -1。有没有办法解决这个问题并强制所有 4 个输出的输出在 -1 和 1 之间?

我的整个代码是:

import gym
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.optimizers import Adam
from rl.agents import DQNAgent
from rl.policy import BoltzmannQPolicy
from rl.memory import SequentialMemory

env = gym.make("BipedalWalker-v3")
states = env.observation_space.shape[0]
actions = env.action_space.shape[0]

print(actions)
print(env.action_space.sample())


def build_model(states, actions):
    model = Sequential()
    model.add(Flatten(input_shape=(1, states)))
    model.add(Dense(24, activation='relu'))
    model.add(Dense(24, activation='relu'))
    model.add(Dense(actions, activation='linear'))
    return model


model = build_model(states, actions)
print(model.summary())


def build_agent(model, actions):
    policy = BoltzmannQPolicy()
    memory = SequentialMemory(limit=50000, window_length=1)
    dqn = DQNAgent(model=model, memory=memory, policy=policy,
                   nb_actions=actions, nb_steps_warmup=10, target_model_update=1e-2)
    return dqn


dqn = build_agent(model, actions)
dqn.compile(Adam(lr=1e-3), metrics=['mae'])
dqn.fit(env, nb_steps=50000, visualize=False, verbose=1)

_ = dqn.test(env, nb_episodes=15, visualize=True)

dqn.save_weights('dqn_weights.h5f', overwrite=True)

标签: tensorflowkerasopenai-gym

解决方案


推荐阅读