首页 > 解决方案 > 如何让 Pygame 渲染为我的神经网络工作?

问题描述

我正在尝试建立一个神经网络来玩蛇。这是训练代码:

def train(self):
    self.build_model()
    for episode in range(self.max_episodes):
        self.current_episode = episode
        env = SnakeEnv(self.screen)
        episode_reward = 0
        for timestep in range(self.max_steps):
            env.render(self.screen)
            state = env.get_state()
            action = None
            epsilon = self.current_eps
            if epsilon > random.random():
                action = np.random.choice(env.action_space) #explore
            else:
                values = self.policy_model.predict(env.get_state()) #exploit
                action = np.argmax(values)
            #print(action)
            experience = env.step(action)
            if(experience['done'] == True):
                break
            episode_reward += experience['reward']
            if(experience['done'] == True):
                continue
            if(len(self.memory) < self.memory_size):
                self.memory.append(Experience(experience['state'], experience['action'], experience['reward'], experience['next_state']))
            else:
                self.memory[self.push_count % self.memory_size] = Experience(experience['state'], experience['action'], experience['reward'], experience['next_state'])
            self.push_count += 1
            self.decay_epsilon(episode)
            if self.can_sample_memory():
                memory_sample = self.sample_memory()
                #q_pred = np.zeros((self.batch_size, 1))
                #q_target = np.zeros((self.batch_size, 1))
                #i = 0
                for memory in memory_sample:
                    memstate = memory.state
                    action = memory.action
                    next_state = memory.next_state
                    reward = memory.reward
                    max_q = reward + self.discount_rate * self.replay_model.predict(next_state)
                    #q_pred[i] = q_value
                    #q_target[i] = max_q
                    #i += 1
                    self.policy_model.fit(memstate, max_q, epochs=1, verbose=0)
            env.render(self.screen)
        print("Episode: ", episode, " Total Reward: ", episode_reward)
        if episode % self.target_update == 0:
            self.replay_model.set_weights(self.policy_model.get_weights())
    pygame.quit()

屏幕初始化代码如下所示:

pygame.init()
self.screen = pygame.display.set_mode((600, 600))
pygame.display.set_caption("Snake") 

环境渲染代码如下所示:

def render(self, screen):
    screen.fill((0, 0, 0))
    for i in range(20):
        pygame.draw.line(screen, (255, 255, 255), (0, 30*i), (600, 30*i))
        pygame.draw.line(screen, (255, 255, 255), (30*i, 0), (30*i, 600))
    self.food.render()
    self.snake.render()
    pygame.display.flip()

食物和蛇的渲染方法只是在适当的坐标处绘制简单的正方形。当我运行训练代码时,我只是得到一个白屏。当我通过按 ctrl C 结束程序时,我看到屏幕在短时间内正确渲染,然后突然关闭。如何让它正确渲染?

标签: pythonmachine-learningpygame

解决方案


您的代码可能在另一个操作系统上运行,但通常,您必须让 pygame 通过调用pygame.event.get()(或.pump()) 来处理窗口管理器的事件。否则,不会在屏幕上绘制任何内容。

因此,在您的循环中,您应该处理事件队列中的事件,并至少处理该QUIT事件,例如:

def render(self, screen):
    ...
    # or create a new function, it's up to you, just to this once per frame
    events = pygame.events.get()
    for e in events:
        if e.type == pygame.QUIT:
            sys.exit() # or whatever to quit the program

你也可以做更多花哨的事情来分离你的训练代码和绘图代码,比如使用回调或协程,但这是另一个话题。


推荐阅读