我有一堆看起来像有人玩电子游戏的图像(我在 Tkinter 中创建的一个简单游戏):

电子游戏中落球; 玩家的盒子在底部


我的目标是让神经网络输出玩家在屏幕底部的位置。如果它们完全在左边,神经网络应该输出 a 0,如果它们在中间,a .5,一直到右边, a 1,以及中间的所有值。

我的图像是 300x400 像素。我非常简单地存储了我的数据。在一个 50 帧的游戏中,我将玩家的每个图像和位置记录为每个帧的元组。[(image, player position), ...]因此,我的结果是一个包含 50 个元素的表单列表。然后我腌制了这份清单。

所以在我的代码中,我尝试创建一个非常基本的前馈网络,它接收图像并输出一个介于 0 和 1 之间的值,表示图像底部的框在哪里。但我的神经网络只输出 1。



# machine learning code mostly from https://machinelearningmastery.com/tutorial-first-neural-network-python-keras/

from keras.models import Sequential
from keras.layers import Dense
import numpy as np
import pickle

def pil_image_to_np_array(image):
    '''Takes an image and converts it to a numpy array'''
    # from https://stackoverflow.com/a/45208895
    # all my images are black and white, so I only need one channel
    return np.array(image)[:, :, 0:1]

def data_to_training_set(data):
    # split the list in the form [(frame 1 image, frame 1 player position), ...] into [[all images], [all player positions]]
    inputs, outputs = [list(val) for val in zip(*data)]
    for index, image in enumerate(inputs):
        # convert the PIL images into numpy arrays so Keras can process them
        inputs[index] = pil_image_to_np_array(image)
    return (inputs, outputs)

if __name__ == "__main__":
    # fix random seed for reproducibility

    # load data
    # data will be in the form [(frame 1 image, frame 1 player position), (frame 2 image, frame 2 player position), ...]
    with open("position_data1.pkl", "rb") as pickled_data:
        data = pickle.load(pickled_data)
    X, Y = data_to_training_set(data)

    # get the width of the images
    width = X[0].shape[1] # == 400
    # convert the player position (a value between 0 and the width of the image) to values between 0 and 1
    for index, output in enumerate(Y):
        Y[index] = output / width

    # flatten the image inputs so they can be passed to a neural network
    for index, inpt in enumerate(X):
        X[index] = np.ndarray.flatten(inpt)

    # keras expects an array (not a list) of image-arrays for input to the neural network
    X = np.array(X)
    Y = np.array(Y)

    # create model
    model = Sequential()
    # my images are 300 x 400 pixels, so each input will be a flattened array of 120000 gray-scale pixel values
    # keep it super simple by not having any deep learning
    model.add(Dense(1, input_dim=120000, activation='sigmoid'))

    # Compile model
    model.compile(loss='mean_squared_error', optimizer='adam')

    # Fit the model
    model.fit(X, Y, epochs=15, batch_size=10)

    # see what the model is doing
    predictions = model.predict(X, batch_size=10)
    print(predictions) # this prints all 1s! # TODO fix

编辑: print(Y) 给我:



  1. 确保XY属于类型float32(当前X属于 类型uint8):

    X = np.array(X, dtype=np.float32)
    Y = np.array(Y, dtype=np.float32)
  2. 在训练神经网络时,对训练数据进行归一化会好得多。归一化有助于优化过程顺利进行并加快收敛到解决方案。它进一步防止大的值导致大的梯度更新,这将是破坏性的。通常,输入数据中每个特征的值应该落在一个很小的范围内,其中两个常见的范围是[-1,1][0,1]。因此,为了确保所有值都在范围内[-1,1],我们从每个特征中减去它的平均值,然后除以它的标准差:

    X_mean = X.mean(axis=0)
    X -= X_mean
    X_std = X.std(axis=0)
    X /= X_std + 1e-8  # add a very small constant to prevent division by zero


    X_test -= X_mean
    X_test /= X_std + 1e-8
  3. 如果您应用第 1 点和第 2 点的更改,您可能会注意到网络不再仅预测 1 或仅预测 0。相反,它显示了一些微弱的学习迹象,并预测了零和一的混合。这还不错,但远非好,我们有很高的期望!预测应该比只有零和一的混合要好得多。在那里,您应该考虑(被遗忘的!)学习率。由于考虑到一个相对简单的问题(并且训练数据的样本很少),网络的参数数量相对较多,因此您应该选择较小的学习率来平滑梯度更新和学习过程:

    from keras import optimizers
    model.compile(loss='mean_squared_error', optimizer=optimizers.Adam(lr=0.0001))

    0.01您会注意到不同之处:损失值在 10 个 epoch 后达到大约。并且网络不再预测零和一的混合;相反,预测更准确,更接近应有的水平(即Y)。

  4. 不要忘记!我们有很高的(合乎逻辑的!)期望。那么,我们如何在不向网络添加任何新层的情况下做得更好(显然,我们假设添加更多层 可能会有所帮助!!)?


    4.2. 添加权重正则化。常见的是 L1 和 L2 正则化(我强烈推荐 Keras的创建者François Chollet写的Deep Learning with Python一书中的 Jupyter notebooks。具体来说,这里是讨论正则化的那个。)

  1. 您应该始终以适当且公正的方式评估您的模型。在训练数据(你曾经训练过的数据)上评估它并不能告诉你任何关于你的模型在看不见的(即新的或真实的)数据点上的表现(例如,考虑一个存储或记忆所有训练的模型数据。它会在训练数据上表现完美,但它会是一个无用的模型,并且在新数据上表现不佳)。所以我们应该有测试和训练数据集:我们在训练数据上训练模型并在测试(即新)数据上评估模型。然而,在提出一个好的模型的过程中,您正在执行大量的实验:例如,您首先更改层的类型和数量,训练模型,然后在测试数据上对其进行评估以确保它是好的。然后你改变另一件事说学习率,再次训练它,然后在测试数据上再次评估它......为了简短起见,这些调整和评估周期以某种方式导致对测试数据的过度拟合。因此,我们需要第三个数据集,称为验证数据(阅读更多:测试集和验证集有什么区别?):

    # first shuffle the data to make sure it isn't in any particular order
    indices = np.arange(X.shape[0])
    X = X[indices]
    Y = Y[indices]
    # you have 200 images
    # we select 100 images for training,
    # 50 images for validation and 50 images for test data
    X_train = X[:100]
    X_val = X[100:150]
    X_test = X[150:]
    Y_train = Y[:100]
    Y_val = Y[100:150]
    Y_test = Y[150:]
    # train and tune the model 
    # you can attempt train and tune the model multiple times,
    # each time with different architecture, hyper-parameters, etc.
    model.fit(X_train, Y_train, epochs=15, batch_size=10, validation_data=(X_val, Y_val))
    # only and only after completing the tuning of your model
    # you should evaluate it on the test data for just one time
    model.evaluate(X_test, Y_test)
    # after you are satisfied with the model performance
    # and want to deploy your model for production use (i.e. real world)
    # you can train your model once more on the whole data available
    # with the best configurations you have found out in your tunings
    model.fit(X, Y, epochs=15, batch_size=10)

    (实际上,当我们可用的训练数据很少时,将验证和测试数据与整个可用数据分开是很浪费的。在这种情况下,如果模型的计算成本不高,而不是分离称为交叉验证的验​​证集,在数据样本很少的情况下,可以进行K 折交叉验证或迭代 K 折交叉验证。)

写这个答案时大约是凌晨 4 点,我感到困倦,但我想再提一件事,这与您的问题没有直接关系:通过使用 Numpy 库及其功能和方法,您可以编写更多内容简洁高效的代码,也为自己节省了很多时间。因此,请确保您更多地练习使用它,因为它在机器学习社区和图书馆中大量使用。为了证明这一点,这里是您编写的相同代码,但更多地使用了 Numpy(请注意,我没有在此代码中应用我上面提到的所有更改):

# machine learning code mostly from https://machinelearningmastery.com/tutorial-first-neural-network-python-keras/

from keras.models import Sequential
from keras.layers import Dense
import numpy as np
import pickle

def pil_image_to_np_array(image):
    '''Takes an image and converts it to a numpy array'''
    # from https://stackoverflow.com/a/45208895
    # all my images are black and white, so I only need one channel
    return np.array(image)[:, :, 0]

def data_to_training_set(data):
    # split the list in the form [(frame 1 image, frame 1 player position), ...] into [[all images], [all player positions]]
    inputs, outputs = zip(*data)
    inputs = [pil_image_to_np_array(image) for image in inputs]
    inputs = np.array(inputs, dtype=np.float32)
    outputs = np.array(outputs, dtype=np.float32)
    return (inputs, outputs)

if __name__ == "__main__":
    # fix random seed for reproducibility

    # load data
    # data will be in the form [(frame 1 image, frame 1 player position), (frame 2 image, frame 2 player position), ...]
    with open("position_data1.pkl", "rb") as pickled_data:
        data = pickle.load(pickled_data)
    X, Y = data_to_training_set(data)

    # get the width of the images
    width = X.shape[2] # == 400
    # convert the player position (a value between 0 and the width of the image) to values between 0 and 1
    Y /= width

    # flatten the image inputs so they can be passed to a neural network
    X = np.reshape(X, (X.shape[0], -1))

    # create model
    model = Sequential()
    # my images are 300 x 400 pixels, so each input will be a flattened array of 120000 gray-scale pixel values
    # keep it super simple by not having any deep learning
    model.add(Dense(1, input_dim=120000, activation='sigmoid'))

    # Compile model
    model.compile(loss='mean_squared_error', optimizer='adam')

    # Fit the model
    model.fit(X, Y, epochs=15, batch_size=10)

    # see what the model is doing
    predictions = model.predict(X, batch_size=10)
    print(predictions) # this prints all 1s! # TODO fix
