首页 > 解决方案 > 为什么NN预测不好?

问题描述

请帮助我理解为什么当训练测试精度为 0.97 时预测不能正确工作。

是来自数据,还是网络应该改变?

输入数据为 32500(5 个手势,6500 张图像)640*480 像素的 RGB 图像。

数据集

加载图像并调整大小 IMG_WIDTH = 100,IMG_HEIGHT = 77。这是加载、调整大小图像并返回 np.array 的函数。

def load_data(data_dir):
"""
Load image data from directory `data_dir`.
Assume `data_dir` has one directory named after each category, numbered
0 through NUM_CATEGORIES - 1. Inside each category directory will be some
number of image files.
Return tuple `(images, labels)`. `images` should be a list of all
of the images in the data directory, where each image is formatted as a
numpy ndarray with dimensions IMG_WIDTH x IMG_HEIGHT x 3. `labels` should
be a list of integer labels, representing the categories for each of the
corresponding `images`.
"""
images = []
labels = []

for dir in range(0, NUM_CATEGORIES):
    # get path for each gesture 
    d = os.path.join(data_dir, f"{str(dir)}")
        # os.listdir(d) return the list of all names of images in that folder
        for image_path in os.listdir(d):
            # get the full path of specific image 
            full_path = os.path.join(data_dir, f"{str(dir)}", image_path)
            # Returns an image that is loaded from the specified file
            image = cv2.imread(full_path)       
            # get dimension for each image
            dim = (100, 77)
            # resized the image
            image_resized = cv2.resize(image, dim, interpolation = cv2.INTER_AREA)
            
            # add image and their directory name to images and labels list
            images.append(image_resized)
            labels.append(dir)
    
    return images, labels

这是我的模型。

def get_model():
"""
Returns a compiled convolutional neural network model. Assume that the
`input_shape` of the first layer is `(IMG_WIDTH=100, IMG_HEIGHT=77, 3)`.
The output layer should have `NUM_GESTURE = 5` units, one for each category.
"""
# Create a convolutional neural network
model = tf.keras.models.Sequential(
    [
    # Convolutional layer. Learn 32 filters using a 3x3 kernel
    tf.keras.layers.Conv2D(
        32, (5, 5), activation='relu', input_shape=(IMG_WIDTH, IMG_HEIGHT, 3)
    ),
    # Max-pooling layer, using 2x2 pool size
    tf.keras.layers.MaxPool2D(pool_size=(2, 2)),
    tf.keras.layers.Conv2D(
        64, (3, 3), activation='relu', input_shape=(IMG_WIDTH, IMG_HEIGHT, 3)
    ),
    # Max-pooling layer, using 2x2 pool size
    tf.keras.layers.MaxPool2D(pool_size=(2, 2)),
    tf.keras.layers.Conv2D(
        64, (3, 3), activation='relu', input_shape=((IMG_WIDTH), (IMG_HEIGHT), 3)
    ),
    tf.keras.layers.MaxPool2D(pool_size=(2, 2)),
    tf.keras.layers.Conv2D(
        128, (3, 3), activation='relu', input_shape=((IMG_WIDTH), (IMG_HEIGHT), 3)
    ),
    tf.keras.layers.MaxPool2D(pool_size=(2, 2)),
    
    tf.keras.layers.Flatten(),
    # Add a hidden layer with dropout
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dropout(0.3),
    # Add an output layer with output units for all 5 gestures
    tf.keras.layers.Dense(5, activation='softmax')
])



 # Train neural network
    model.compile(
        optimizer='adam',
        loss="categorical_crossentropy",
        metrics=["accuracy"]
    )
    return model

labels = tf.keras.utils.to_categorical(labels)
x_train, x_test, y_train, y_test = train_test_split(
np.array(images), np.array(labels), test_size=0.4)
model = get_model()
model.fit(x_train, y_train, batch_size=64, epochs=10)
model.evaluate(x_test, y_test, verbose=2)

结果是 0.97。 拟合结果

从视频中我保存图像并希望实时预测手势。

GESTURE = {0:"ok", 1:"down", 2:"up", 3:"palm", 4:"l"}
video = cv2.VideoCapture(0)

while True:
    # Capture the video frame
    ret, img = video.read()

    # Display the resulting frame
    # to flip the video with 180 degree 
    image = cv2.flip(img, 1)
    
    # save image for prediction
    image = cv2.imwrite('Frame'+str(0)+'.jpg', image)
    image_addr = "Frame0.jpg"
    image = cv2.imread(image_addr)
    
    dim = (100,77)
    
    image = tf.keras.preprocessing.image.load_img(image_addr, target_size=dim)
    # Converts a PIL Image instance to a Numpy array. Return a 3D Numpy array.
    input_arr = tf.keras.preprocessing.image.img_to_array(image)
    # Convert single image to a batch.
    input_arr = np.array([input_arr])
    input_arr = input_arr.astype('float32')/255
    # Generates output predictions for the input samples. Return Numpy array(s) of predictions.
    predictions = model.predict(input_arr)
    # Return the index_array of the maximum values along an axis.
    pre_class = np.argmax(predictions, axis=-1)
    # for writing in the video
    text = GESTURE[pre_class[0]]
    font = cv2.FONT_HERSHEY_SIMPLEX
    image = cv2.flip(img, 1)

    cv2.putText(image, 
            text, 
            (50, 50), 
            font, 2, 
            (0, 0, 0), 
            2, 
            cv2.LINE_4)
            
    cv2.imshow('video', image)


    # the 'q' button is set as the
    # quitting button you may use any
    # desired button of your choice

    k = cv2.waitKey(1)
    if k == ord('q'):
            break

video.release()       
cv2.destroyAllWindows()

github链接

标签: pythontensorflowcomputer-visionconv-neural-networktensorflow2.0

解决方案


我不是专家,但通常当您在训练数据和测试数据“ The result is 0.97”上表现良好,但在新的最终用户数据上表现不佳时,这是因为存在数据不匹配(尽管它可能过度拟合)。

例如,您训练和测试的数据是如此不同(像素值、像素的概率分布或模型注意到的看不见的差异),以至于模型无法对其进行泛化并且表现不佳。

使用在生产/最终产品中使用的相同数据作为测试集是一种很好的做法。Andrew Ng 使用这个数据集拆分(如果你有足够的数据,这是适用的):

从训练数据来看:

  • 训练集
  • Train-Dev(我认为与 Validation 相同)设置

从最终产品数据:

  • 开发集
  • 测试集

您可以查看这篇文章以获取有关原因的更多信息:https ://cs230.stanford.edu/blog/split/


推荐阅读