首页 > 解决方案 > 如何解决“KeyError:'/conv2d_1/kernel:0'”

问题描述

我正在尝试使用 colab 来运行带有 pacman 的 gym 包,因为 colab 中的规范比我的笔记本更强大。该程序在我的笔记本中使用 tensorflow 1.14 在 Jupyter 中成功模拟。但是我在google colab中模拟的时候一直报错,我也调试和修改了部分代码,让代码可以在tensor flow 2.0中使用。下面是我的代码

   #First we import all the necessary libraries
   import numpy as np
   import gym
   import tensorflow as tf
   from tensorflow import keras
   from keras.layers import Flatten, Conv2D, Dense
   #from tensorflow.contrib.layers import Flatten, conv2d, Dense
   from collections import deque, Counter
   import random
   from datetime import datetime

    #Now we define a function called preprocess_observation for preprocessing our input game screen.
    #We reduce the image size and convert the image into greyscale.

    color = np.array([210, 164, 74]).mean()
    def preprocess_observation(obs):
      # Crop and resize the image
      img = obs[1:176:2, ::2]
      # Convert the image to greyscale
      img = img.mean(axis=2)
      # Improve image contrast
      img[img==color] = 0
      # Next we normalize the image from -1 to +1
      img = (img - 128) / 128-1
      return img.reshape(88,80,1)

    #Let us initialize our gym environment

    env = gym.make('MsPacman-v0')
    n_outputs = env.action_space.n
    print(n_outputs)
    print(env.env.get_action_meanings())
    observation = env.reset()
    import tensorflow as tf
    import matplotlib.pyplot as plt
    for i in range(22):
      if i > 20:
        plt.imshow(observation)
        plt.show()
    observation, _, _, _ = env.step(1)

    #Okay, Now we define a function called q_network for building our Q network. We input the game                         state to the Q network 
    #and get the Q values for all the actions in that state.
    #We build Q network with three convolutional layers with same padding followed by a fully connected layer.
    tf.compat.v1.reset_default_graph()

    def q_network(X, name_scope):

    # Initialize layers
      initializer = tf.compat.v1.keras.initializers.VarianceScaling(scale=2.0)
      with tf.compat.v1.variable_scope(name_scope) as scope:

        # initialize the convolutional layers
        #layer_1 = tf.keras.layers.Conv2D(X, 32, kernel_size=(8,8), stride=4, padding='SAME',                 weights_initializer=initializer)
        layer_1_set = Conv2D(32, (8,8), strides=4, padding="SAME", kernel_initializer=initializer)
        layer_1= layer_1_set(X)
        tf.compat.v1.summary.histogram('layer_1',layer_1)
        #layer_2 = tf.keras.layers.Conv2D(layer_1, 64, kernel_size=(4,4), stride=2, padding='SAME', weights_initializer=initializer)
        layer_2_set = Conv2D(64, (4,4), strides=2, padding="SAME", kernel_initializer=initializer)
        layer_2= layer_2_set(layer_1)
        tf.compat.v1.summary.histogram('layer_2',layer_2)
        #layer_3 = tf.keras.layers.Conv2D(layer_2, 64, kernel_size=(3,3), stride=1, padding='SAME', weights_initializer=initializer)
        layer_3_set = Conv2D(64, (3,3), strides=1, padding="SAME", kernel_initializer=initializer)
        layer_3= layer_3_set(layer_2)
        tf.compat.v1.summary.histogram('layer_3',layer_3)
        flatten_layer = Flatten()  # instantiate the layer
        flat = flatten_layer(layer_3)  
        fc_set = Dense(128, kernel_initializer=initializer)
        fc=fc_set(flat)
        tf.compat.v1.summary.histogram('fc',fc)

        #Add final output layer
        output_set = Dense(n_outputs, activation= None, kernel_initializer=initializer)
        output= output_set(fc)
        tf.compat.v1.summary.histogram('output',output)
        vars = {v.name[len(scope.name):]: v for v in tf.compat.v1.get_collection(key=tf.compat.v1.GraphKeys.TRAINABLE_VARIABLES, scope=scope.name)}

        #Return both variables and outputs together
        return vars, output

    #Next we define a function called epsilon_greedy for performing epsilon greedy policy. In epsilon greedy policy we either select the best action 
    #with probability 1 - epsilon or a random action with probability epsilon.
    #We use decaying epsilon greedy policy where value of epsilon will be decaying over time 
    #as we don't want to explore forever. So over time our policy will be exploiting only good actions.

    epsilon = 0.5
    eps_min = 0.05
    eps_max = 1.0
    eps_decay_steps = 500000

    def epsilon_greedy(action, step):
        p = np.random.random(1).squeeze()
        epsilon = max(eps_min, eps_max - (eps_max-eps_min) * step/eps_decay_steps)
        if np.random.rand() < epsilon:
            return np.random.randint(n_outputs)
        else:
            return action

    #Now, we initialize our experience replay buffer of length 20000 which holds the experience.
    #We store all the agent's experience i.e (state, action, rewards) in the 
    #experience replay buffer and we sample from this minibatch of experience for training the network.

    buffer_len = 20000
    exp_buffer = deque(maxlen=buffer_len)

   # Now we define our network hyperparameters,


   num_episodes = 800
   batch_size = 48
   input_shape = (None, 88, 80, 1)
   learning_rate = 0.001
   X_shape = (None, 88, 80, 1)
   discount_factor = 0.97

   global_step = 0
   copy_steps = 100
   steps_train = 4
   start_steps = 2000

    logdir = 'logs'
    tf.compat.v1.reset_default_graph()
    import tensorflow.compat.v1 as tf
    tf.disable_v2_behavior()
    # Now we define the placeholder for our input i.e game state
    X = tf.placeholder(tf.float32, shape=X_shape)
    #X = tf.Variable(tf.float32, tf.ones(shape=X_shape))
    # we define a boolean called in_training_model to toggle the training
    in_training_mode = tf.placeholder(tf.bool)

   # we build our Q network, which takes the input X and generates Q values for all the actions in the state
   mainQ, mainQ_outputs = q_network(X, 'mainQ')

   # similarly we build our target Q network, for policy evaluation
   targetQ, targetQ_outputs = q_network(X, 'targetQ')

    # define the placeholder for our action values
    X_action = tf.placeholder(tf.int32, shape=(None,))
    Q_action = tf.reduce_sum(targetQ_outputs * tf.one_hot(X_action, n_outputs), axis=-1, keepdims=True)

   #Copy the primary Q network parameters to the target Q network
   copy_op = [tf.compat.v1.assign(main_name, targetQ[var_name]) for var_name, main_name in        mainQ.items()]
   copy_target_to_main = tf.group(*copy_op)

    #Compute and optimize loss using gradient descent optimizer


    # define a placeholder for our output i.e action
    y = tf.placeholder(tf.float32, shape=(None,1))

    # now we calculate the loss which is the difference between actual value and predicted value
    loss = tf.reduce_mean(tf.square(y - Q_action))

    # we use adam optimizer for minimizing the loss
    optimizer = tf.train.AdamOptimizer(learning_rate)
    training_op = optimizer.minimize(loss)

    init = tf.global_variables_initializer()

    loss_summary = tf.summary.scalar('LOSS', loss)
    merge_summary = tf.summary.merge_all()
    file_writer = tf.summary.FileWriter(logdir, tf.get_default_graph())

好的,到这里,当我在 colab 中运行这个单元格时出现错误:

   #Copy the primary Q network parameters to the target Q network
   copy_op = [tf.compat.v1.assign(main_name, targetQ[var_name]) for var_name, main_name in mainQ.items()]
   copy_target_to_main = tf.group(*copy_op)

错误给出:

   ---------------------------------------------------------------------------
   KeyError                                  Traceback (most recent call last)
   <ipython-input-13-58715282cea8> in <module>()
   ----> 1 copy_op = [tf.compat.v1.assign(main_name, targetQ[var_name]) for var_name, main_name in mainQ.items()]
         2 copy_target_to_main = tf.group(*copy_op)

   <ipython-input-13-58715282cea8> in <listcomp>(.0)
   ----> 1 copy_op = [tf.compat.v1.assign(main_name, targetQ[var_name]) for var_name, main_name in mainQ.items()]
         2 copy_target_to_main = tf.group(*copy_op)

   KeyError: '/conv2d_1/kernel:0'

我有两个问题?首先,如何解决上面已经提到的问题。

其次,在tensor-flow 2.0上面,placeholder命令被tf.Variable代替,我重写了代码:

    X = tf.placeholder(tf.float32, shape=X_shape) to become
    X = tf.Variable(tf.float32, tf.ones(shape=X_shape))

仍然出现错误,我必须使用以下命令:

    import tensorflow.compat.v1 as tf
    tf.disable_v2_behavior()
    X = tf.placeholder(tf.float32, shape=X_shape)

但收到这样的警告:

    WARNING:tensorflow:From /usr/local/lib/python3.6/dist-        packages/tensorflow/python/compat/v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
    Instructions for updating: non-resource variables are not supported in the long term

我使用关键字在 Stack Overflow 网站上进行密集搜索,但找不到解决方案。真的很期待任何建议。非常感谢。

标签: tensorflowkernelkeyerroropenai-gympacman

解决方案


推荐阅读