首页 > 解决方案 > How to build a DQN that outputs 1 discrete and 1 continuous value as a pair?

问题描述

I am building a DQN for an Open Gym environment. My observation space is only 1 discrete value but my actions are:

self.action_space = (Discrete(3), Box(-100, 100, (1,)))

ex: [1,56], [0,24], [2,-78]...

My current neural network is:

model = Sequential()
model.add(Dense(24, activation='relu', input_shape=states)) # (1,)
model.add(Dense(24, activation='relu'))
model.add(Dense(2, activation='linear'))

(I copied it from a tutorial that only outputs 1 discrete value in the range [0,1]}

I understand that I need to change the last layer of my neural network but what would it be in my case?

My guess is that the last layer should have 3 binary outputs and 1 continuous output but I don't know if it is possible to have different natures of outputs within the same layer.

标签: pythontensorflowreinforcement-learningopenai-gymdqn

解决方案


正如您在评论中已经指出的那样,由于 DQN 的工作方式,DQN 与连续动作空间不兼容;-当是连续argmax of "a" for Q(s,a)的时候,不可能检查Q(s,a)所有的。aa


话虽如此,当将此应用于策略梯度方法(连续动作空间兼容)时,您将在问题中遇到相同的问题,因为使用策略梯度您需要为您采取的每个动作提供概率。像这样的东西可以工作:

  • Actor(在这种情况下为神经网络)提供 3 个输出。
  • 前 2 个输出是每个离散值的概率。
  • 第三个输出是你的连续值。

取前两个输出的 softmax,它给你你的离散值,然后取第三个输出,它是连续的,这会给你你的行动。然后,您需要导出该动作的概率,该概率由所有输出的组合概率给出。


推荐阅读