首页 > 解决方案 > KeyError:在使用 OpenAI stable-baselines3 和健身房尝试多智能体强化学习时出现“观察”

问题描述

我试图在这里使用饥饿的鹅健身房训练 PPO:

from kaggle_environments import make
from stable_baselines3 import PPO

directions = {0:'EAST', 1:'NORTH', 2:'WEST', 3:'SOUTH'}
loaded_model = PPO.load('logs\\dqn2ppo_nonvec\\model')

def agent_ppo(obs, config):
    a = directions[loaded_model.predict(obs)[0]]
    return a 
    
env = make('hungry_geese',debug=True)
env.run([agent_ppo,'agent_bfs.py'])
env.render(mode="ipython")

但我的游戏只玩了一步。在调试 ON 运行后,我得到以下跟踪:

Traceback (most recent call last):
  File "c:\users\crrma\.virtualenvs\hungry_geese-ept5y6nv\lib\site-packages\kaggle_environments\agent.py", line 151, in act
    action = self.agent(*args)
  File "<ipython-input-29-faad97d317d6>", line 5, in agent_ppo
    a = directions[loaded_model.predict(obs)[0]]
  File "c:\users\crrma\.virtualenvs\hungry_geese-ept5y6nv\lib\site-packages\stable_baselines3\common\base_class.py", line 497, in predict
    return self.policy.predict(observation, state, mask, deterministic)
  File "c:\users\crrma\.virtualenvs\hungry_geese-ept5y6nv\lib\site-packages\stable_baselines3\common\policies.py", line 262, in predict
    observation = ObsDictWrapper.convert_dict(observation)
  File "c:\users\crrma\.virtualenvs\hungry_geese-ept5y6nv\lib\site-packages\stable_baselines3\common\vec_env\obs_dict_wrapper.py", line 68, in convert_dict
    return np.concatenate([observatio I was trying to use hungry-geese gym [here](https://www.kaggle.com/victordelafuente/dqn-goose-with-stable-baselines3-pytorch#) to train PPO. But my game was getting played for only one step. After running with debug ON I got following trace:

Traceback (most recent call last):
  File "c:\users\crrma\.virtualenvs\hungry_geese-ept5y6nv\lib\site-packages\kaggle_environments\agent.py", line 151, in act
    action = self.agent(*args)
  File "<ipython-input-29-faad97d317d6>", line 5, in agent_ppo
    a = directions[loaded_model.predict(obs)[0]]
  File "c:\users\crrma\.virtualenvs\hungry_geese-ept5y6nv\lib\site-packages\stable_baselines3\common\base_class.py", line 497, in predict
    return self.policy.predict(observation, state, mask, deterministic)
  File "c:\users\crrma\.virtualenvs\hungry_geese-ept5y6nv\lib\site-packages\stable_baselines3\common\policies.py", line 262, in predict
    observation = ObsDictWrapper.convert_dict(observation)
  File "c:\users\crrma\.virtualenvs\hungry_geese-ept5y6nv\lib\site-packages\stable_baselines3\common\vec_env\obs_dict_wrapper.py", line 68, in convert_dict
    return np.concatenate([observation_dict[observation_key], observation_dict[goal_key]], axis=-1)
KeyError: 'observation'

所以我在vscode中调试了更多。从下面的屏幕截图中可以看出,observationdesired_goal键都不存在于observation_dict.

图片

这也是我调试上述调用的方式:

图片

我是否错误地使用了 API 以导致这种情况发生(我是 API 新手)?(或者这可能是一个错误,我觉得这不太可能。)

Colab 笔记本模型

标签: pythonmachine-learningreinforcement-learningopenai-gymstable-baselines

解决方案


推荐阅读