python - How to build a DQN that outputs 1 discrete and 1 continuous value as a pair?
问题描述
I am building a DQN for an Open Gym environment. My observation space is only 1 discrete value but my actions are:
self.action_space = (Discrete(3), Box(-100, 100, (1,)))
ex: [1,56], [0,24], [2,-78]...
My current neural network is:
model = Sequential()
model.add(Dense(24, activation='relu', input_shape=states)) # (1,)
model.add(Dense(24, activation='relu'))
model.add(Dense(2, activation='linear'))
(I copied it from a tutorial that only outputs 1 discrete value in the range [0,1]}
I understand that I need to change the last layer of my neural network but what would it be in my case?
My guess is that the last layer should have 3 binary outputs and 1 continuous output but I don't know if it is possible to have different natures of outputs within the same layer.
解决方案
正如您在评论中已经指出的那样,由于 DQN 的工作方式,DQN 与连续动作空间不兼容;-当是连续argmax of "a" for Q(s,a)
的时候,不可能检查Q(s,a)
所有的。a
a
话虽如此,当将此应用于策略梯度方法(与连续动作空间兼容)时,您将在问题中遇到相同的问题,因为使用策略梯度您需要为您采取的每个动作提供概率。像这样的东西可以工作:
- Actor(在这种情况下为神经网络)提供 3 个输出。
- 前 2 个输出是每个离散值的概率。
- 第三个输出是你的连续值。
取前两个输出的 softmax,它给你你的离散值,然后取第三个输出,它是连续的,这会给你你的行动。然后,您需要导出该动作的概率,该概率由所有输出的组合概率给出。