python - DQN - 无法解决 Cartpole-v1 - 我做错了什么?
问题描述
我一直在尝试通过连续 100 步获得 475 的平均奖励来解决 CartPole-V1。
我尝试了许多具有固定 Q 值的 DQN 架构。我究竟做错了什么?
这些是我的超参数:
TOTAL_EPISODES = 5000
T = 500
LR = 0.01
GAMMA = 0.95
MIN_EPSILON = 0.01
EPSILON_DECAY_RATE = 0.9995
epsilon = 1.0 # moving epsilon
state_size = env.observation_space.shape[0]
action_size = env.action_space.n
batch_size = 64
C = 8
reward_discount = 10
deque_size = 2000
experience_replay = deque(maxlen=deque_size)
我尝试在 [0.01, 0.02, 0.001] 中使用 LR,降低 epsilon 衰减率,batch_size = 32, C = 4, ...
实现与图片中的相同,我将放置非平凡的部分:
def train_on_batch(batch_size, memory, gamma, model, ddqn_target_model, losses):
minibatch = random.sample(memory, batch_size)
states = np.zeros((batch_size, 4))
targets = np.zeros((batch_size, 2))
for index, (state, action, reward, next_state, done) in enumerate(minibatch):
states[index] = state.reshape(1, 4)
model_target = model.predict(state.reshape(1, 4))
target_pred = ddqn_target_model.predict(next_state.reshape(1, 4))
if done:
target = reward
else:
target = reward + gamma * (np.amax(target_pred))
model_target[0][action] = target
targets[index] = model_target[0]
history = model.fit(states, targets, batch_size=batch_size, epochs=1, verbose=0)
losses.append(history.history['loss'][0])
def build_model(state_size, action_size, learning_rate, layers_num=3):
model = Sequential()
if layers_num == 3:
model.add(Dense(24, input_dim=state_size, activation='relu'))
model.add(Dense(24, activation='relu'))
model.add(Dense(24, activation='relu'))
else:
model.add(Dense(18, input_dim=state_size, activation='relu'))
model.add(Dense(18, activation='relu'))
model.add(Dense(18, activation='relu'))
model.add(Dense(18, activation='relu'))
model.add(Dense(18, activation='relu'))
model.add(Dense(action_size, activation='linear'))
model.compile(loss='mse',
optimizer=Adam(lr=learning_rate))
return model
def sample_action(model, state, epsilon):
if random.uniform(0, 1) < epsilon:
action = env.action_space.sample()
else:
action_pred = model.predict(state)
action = np.argmax(action_pred[0])
return action
起初我做ddqn_target_model.set_weights(model.get_weights())
,在剧集迭代中我做if episode % C == 0:
ddqn_target_model.set_weights(model.get_weights())
我错过了什么?
谢谢
解决方案
推荐阅读
- c# - 将 SQL 查询转换为 LINQ 或 LINQ fluent 语法
- c# - BoDi.ObjectContainerException:无法解析接口:OpenQA.Selenium.IWebDriver
- c++ - 在模板化数据结构上调用 begin() 或 end()
- jestjs - 开玩笑 puppeteer 自定义测试环境中的全局在测试中丢失上下文
- google-apps-script - 如何使用应用程序脚本调用工作簿/工作表 - 完全合格的参考
- java - org.springframework.web.servlet.DispatcherServlet noHandlerFound 404 错误响应
- r - 如何在闪避直方图的不同 bin 之间插入填充?
- javascript - AJAX 请求返回带有属性的 JSON,而不是整个字符串作为一个值
- asp.net-mvc - 发布网站后表单不起作用,但它在本地主机上工作
- python - 将多行 Pandas DataFrame 添加到新 DataFrame 中