首页 > 解决方案 > 卷积神经网络似乎是随机猜测

问题描述

所以我目前正在尝试使用卷积神经网络构建一个种族识别程序。我正在输入 200 像素 x 200 像素版本的 UTKFaceRegonition数据集(如果您想查看,请将我的数据集放在谷歌驱动器上)。我使用 keras 和 tensorflow 使用 8 个不同的类(4 个种族 * 2 种性别),每个类有大约 700 张图像,但我已经用 1000 张完成了它。问题是当我运行网络时,它的准确率最高为 13.5%,大约为 11-12.5 % 验证准确率,损失在 2.079-2.081 左右,即使经过 50 个 epoch 左右,它也根本不会改善。我目前的假设是它是随机猜测/不学习,因为 8/100=12.5%,这是关于它得到了什么,而在我用 3 个类制作的其他模型上,它得到了大约 33%

我注意到验证准确性在第一个和有时第二个 epoch 上是不同的,但在那之后它最终保持不变。我增加了像素分辨率,改变了层数,层类型和每层神经元,我尝试了优化器(sgd 在正常 lr 和非常大和小(.1 和 10^-6),我'我尝试了不同的损失函数,比如 KLDivergence,但似乎没有任何影响,除了 KLDivergence,它在一次运行中做得很好(大约 16%),但后来又失败了。我的一些想法可能是数据集中有太多噪音,或者也许它与密集层的数量有关,但老实说,我不知道为什么它不学习。

这是制作张量的代码

import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import os
import cv2
import random
import pickle

WIDTH_SIZE = 200
HEIGHT_SIZE = 200

CATEGORIES = []
for CATEGORY in os.listdir('./TRAINING'):
    CATEGORIES.append(CATEGORY)
DATADIR = "./TRAINING"
training_data = []
def create_training_data():
    for category in CATEGORIES:
        path = os.path.join(DATADIR, category)
        class_num = CATEGORIES.index(category)
        for img in os.listdir(path)[:700]:
            try:
                img_array = cv2.imread(os.path.join(path,img), cv2.IMREAD_COLOR)
                new_array = cv2.resize(img_array,(WIDTH_SIZE,HEIGHT_SIZE))
                training_data.append([new_array,class_num])
            except Exception as error:
                print(error)

create_training_data()

random.shuffle(training_data)
X = []
y = []

for features, label in training_data:
    X.append(features)
    y.append(label)

X = np.array(X).reshape(-1, WIDTH_SIZE, HEIGHT_SIZE, 3)
y = np.array(y)

pickle_out = open("X.pickle", "wb")
pickle.dump(X, pickle_out)
pickle_out = open("y.pickle", "wb")
pickle.dump(y, pickle_out)

这是我的构建模型

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation, Flatten, Conv2D, MaxPooling2D
import pickle

pickle_in = open("X.pickle","rb")
X = pickle.load(pickle_in)
pickle_in = open("y.pickle","rb")
y = pickle.load(pickle_in)
X = X/255.0

model = Sequential()
model.add(Conv2D(256, (2,2), activation = 'relu', input_shape = X.shape[1:]))
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Conv2D(256, (2,2), activation = 'relu'))
model.add(Conv2D(256, (2,2), activation = 'relu'))
model.add(Conv2D(256, (2,2), activation = 'relu'))
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Conv2D(256, (2,2), activation = 'relu'))
model.add(Conv2D(256, (2,2), activation = 'relu'))
model.add(Dropout(0.4))
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Conv2D(256, (2,2), activation = 'relu'))
model.add(Conv2D(256, (2,2), activation = 'relu'))
model.add(Dropout(0.4))
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Conv2D(256, (2,2), activation = 'relu'))
model.add(Conv2D(256, (2,2), activation = 'relu'))
model.add(Dropout(0.4))
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Flatten())
model.add(Dense(8, activation="softmax"))

model.compile(optimizer='adam',loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),metrics=['accuracy'])

model.fit(X, y, batch_size=16,epochs=100,validation_split=.1)

这是我运行的 10 个 epoch 的日志。

5040/5040 [==============================] - 55s 11ms/sample - loss: 2.0803 - accuracy: 0.1226 - val_loss: 2.0796 - val_accuracy: 0.1250
Epoch 2/100
5040/5040 [==============================] - 53s 10ms/sample - loss: 2.0797 - accuracy: 0.1147 - val_loss: 2.0798 - val_accuracy: 0.1161
Epoch 3/100
5040/5040 [==============================] - 53s 10ms/sample - loss: 2.0797 - accuracy: 0.1190 - val_loss: 2.0800 - val_accuracy: 0.1161
Epoch 4/100
5040/5040 [==============================] - 53s 11ms/sample - loss: 2.0797 - accuracy: 0.1173 - val_loss: 2.0799 - val_accuracy: 0.1107
Epoch 5/100
5040/5040 [==============================] - 52s 10ms/sample - loss: 2.0797 - accuracy: 0.1183 - val_loss: 2.0802 - val_accuracy: 0.1107
Epoch 6/100
5040/5040 [==============================] - 52s 10ms/sample - loss: 2.0797 - accuracy: 0.1226 - val_loss: 2.0801 - val_accuracy: 0.1107
Epoch 7/100
5040/5040 [==============================] - 52s 10ms/sample - loss: 2.0797 - accuracy: 0.1238 - val_loss: 2.0803 - val_accuracy: 0.1107
Epoch 8/100
5040/5040 [==============================] - 54s 11ms/sample - loss: 2.0797 - accuracy: 0.1169 - val_loss: 2.0802 - val_accuracy: 0.1107
Epoch 9/100
5040/5040 [==============================] - 52s 10ms/sample - loss: 2.0797 - accuracy: 0.1212 - val_loss: 2.0803 - val_accuracy: 0.1107
Epoch 10/100
5040/5040 [==============================] - 53s 11ms/sample - loss: 2.0797 - accuracy: 0.1177 - val_loss: 2.0802 - val_accuracy: 0.1107

所以,是的,为什么我的网络似乎只是在猜测有什么帮助?谢谢!

标签: tensorflowkerasdeep-learningneural-networkconv-neural-network

解决方案


问题在于您的网络设计。

  • 通常,您希望在第一层学习高级特征并使用具有奇数大小的更大内核。目前,您实际上是在对相邻像素进行插值。为什么是奇怪的尺寸?在这里阅读例如。
  • 当深入网络时,过滤器的数量通常会从小(例如 16、32)数量增加到更大的值。在您的网络中,所有层都学习相同数量的过滤器。原因是你越深入,你想学习的细粒度特征就越多——因此过滤器的数量就会增加。
  • 在您的 ANN 中,每一层还会从图像中删除有价值的信息(默认情况下,您使用的是valid填充)。

这是一个非常基本的网络,它在 40 秒和 10 个 epoch 后获得超过 95% 的训练准确度:

import pickle
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, Conv2D, MaxPooling2D

pickle_in = open("X.pickle","rb")
X = pickle.load(pickle_in)
pickle_in = open("y.pickle","rb")
y = pickle.load(pickle_in)
X = X/255.0

model = Sequential()
model.add(Conv2D(16, (5,5), activation = 'relu', input_shape = X.shape[1:], padding='same'))
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Conv2D(32, (3,3), activation = 'relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Conv2D(64, (3,3), activation = 'relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Flatten())
model.add(Dense(512))
model.add(Dense(8, activation='softmax'))

model.compile(optimizer='adam',loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),metrics=['accuracy'])

建筑学:

Model: "sequential_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_19 (Conv2D)           (None, 200, 200, 16)      1216      
_________________________________________________________________
max_pooling2d_14 (MaxPooling (None, 100, 100, 16)      0         
_________________________________________________________________
conv2d_20 (Conv2D)           (None, 100, 100, 32)      4640      
_________________________________________________________________
max_pooling2d_15 (MaxPooling (None, 50, 50, 32)        0         
_________________________________________________________________
conv2d_21 (Conv2D)           (None, 50, 50, 64)        18496     
_________________________________________________________________
max_pooling2d_16 (MaxPooling (None, 25, 25, 64)        0         
_________________________________________________________________
flatten_4 (Flatten)          (None, 40000)             0         
_________________________________________________________________
dense_7 (Dense)              (None, 512)               20480512  
_________________________________________________________________
dense_8 (Dense)              (None, 8)                 4104      
=================================================================
Total params: 20,508,968
Trainable params: 20,508,968
Non-trainable params: 0

Training:

Train on 5040 samples, validate on 560 samples
Epoch 1/10
5040/5040 [==============================] - 7s 1ms/sample - loss: 2.2725 - accuracy: 0.1897 - val_loss: 1.8939 - val_accuracy: 0.2946
Epoch 2/10
5040/5040 [==============================] - 6s 1ms/sample - loss: 1.7831 - accuracy: 0.3375 - val_loss: 1.8658 - val_accuracy: 0.3179
Epoch 3/10
5040/5040 [==============================] - 6s 1ms/sample - loss: 1.4857 - accuracy: 0.4623 - val_loss: 1.9507 - val_accuracy: 0.3357
Epoch 4/10
5040/5040 [==============================] - 6s 1ms/sample - loss: 1.1294 - accuracy: 0.6028 - val_loss: 2.1745 - val_accuracy: 0.3250
Epoch 5/10
5040/5040 [==============================] - 6s 1ms/sample - loss: 0.8060 - accuracy: 0.7179 - val_loss: 3.1622 - val_accuracy: 0.3000
Epoch 6/10
5040/5040 [==============================] - 6s 1ms/sample - loss: 0.5574 - accuracy: 0.8169 - val_loss: 3.7494 - val_accuracy: 0.2839
Epoch 7/10
5040/5040 [==============================] - 6s 1ms/sample - loss: 0.3756 - accuracy: 0.8813 - val_loss: 4.9125 - val_accuracy: 0.2643
Epoch 8/10
5040/5040 [==============================] - 6s 1ms/sample - loss: 0.3001 - accuracy: 0.9036 - val_loss: 5.6300 - val_accuracy: 0.2821
Epoch 9/10
5040/5040 [==============================] - 6s 1ms/sample - loss: 0.2345 - accuracy: 0.9337 - val_loss: 5.7263 - val_accuracy: 0.2679
Epoch 10/10
5040/5040 [==============================] - 6s 1ms/sample - loss: 0.1549 - accuracy: 0.9581 - val_loss: 7.3682 - val_accuracy: 0.2732

As you can see, validation score is terrible, but the point was to demonstrate that poor architecture can prevent training altogether.


推荐阅读