python - 用于 NLP 的 TensorFlow CNN 不会收敛
问题描述
我试图根据 Yoon Kim ( https://arxiv.org/pdf/1408.5882.pdf )在本文中提出的模型创建一个神经网络,用于句子分类。我已经在 TensorFlow Keras 中构建了它,并使用填充句子(词形还原)作为输入,并使用 3 个类别(“正”、“中性”或“负”)作为输出。
下面是我建立的模型:
def create_CNN_model(window_sizes, feature_maps, sent_size, num_categs, embedding_matrix:np.array):
inputs = Input(shape=(sent_size), dtype='float32', name='text_inputs') # dim = (BATCH_SIZE, sent_size, embedding_dim)
# initialize the embeddings with my own embeddings matrix
embed = Embedding(embedding_matrix.shape[0], embedding_matrix.shape[1],
mask_zero=True, input_length=sent_size,
weights=[embedding_matrix])(inputs)
#create array for max pooled vectors of features
ta = []
# as we have multiple window sizes:
for n_window in window_sizes:
con = Conv1D(feature_maps, n_window, padding='causal',
activation="relu", use_bias=True)(embed) # (BATCH_SIZE, sent_size-window_size+1, feature_maps)
# the convoluted tensor contains, for each window a feature map of dimension feature_maps
pooled = GlobalMaxPool1D(data_format='channels_last')(con) # (BATCH_SIZE, sent_size-windows_size+1)
# then, the max pooling operation extracts the maximum of each feature map, reducing the rank of the tensor
# the max pooled tensor contains a feature for each window
ta.append(pooled)
concat = concatenate(ta, axis=1)
dropped = Dropout(0.5)(concat)
outputs = Dense(num_categs,activation="softmax",use_bias=True, kernel_regularizer=l2(l=3),
kernel_constraint=Dropout(0.5))(dropped)
# create the model
model = Model(inputs=[inputs], outputs=[outputs])
#return the model
return model
我试过用 200 句话训练这个模型,看看它是否过拟合数据。但不是过度拟合,损失值只是在 0 和 1 之间上下波动。我尝试将学习率更改为小至 1e-8 的值,但它什么也没做。
以下是我用于培训的功能:
def train_model(X_data, y_data, batch_sz, tf_model, max_patience, num_epochs, ln_rate):
# Instantiate an optimizer to train the model.
# optimizer = Adadelta(learning_rate=1e-3)
optimizer = Adam(learning_rate=ln_rate)
# Instantiate a loss function.
loss_fn = CategoricalCrossentropy()
# Prepare the metrics
train_acc_metric = CategoricalAccuracy()
val_acc_metric = CategoricalAccuracy()
buffer_sz = len(X_data)
patience = 0
epochs = num_epochs
last_val_acc = 0
# Start random state for better reprodutibility
np.random.seed(123)
# Create the checkpoints
ckpt = train.Checkpoint(step=tf.Variable(1), optimizer=optimizer,
model=tf_model)
manager = train.CheckpointManager(ckpt, './tf_ckpts', max_to_keep=3)
# Create directory to save the trained model
path = "./saved_model"
print("\n----------------------------------------------")
if not os.path.isdir(path):
try:
os.mkdir(path)
except OSError:
print ("\nCreation of the directory %s failed \n" % path)
else:
print ("\nSuccessfully created the directory %s \n" % path)
else:
print("\nDirectory %s already exists" % path)
print("\n----------------------------------------------")
print("\nStarting run script...\n",
"Model will be saved to ", path,"\n",
"Checkpoints will be restored from and saved to .\tf_ckpts")
# Save model prior to training
tf_model.save("./saved_model/tf_model")
# Restart from last checkpoint, if available
ckpt.restore(manager.latest_checkpoint)
print("\n----------------------------------------------")
if manager.latest_checkpoint:
print("\nRestored from {}".format(manager.latest_checkpoint))
else:
print("\nInitializing from scratch.")
# beggining training loop
for epoch in range(epochs):
print("\n----------------------------------------------")
print('Start of epoch %d' % (epoch,))
# re-shuffle data before each epoch
np.random.shuffle(X_data)
np.random.shuffle(y_data)
# create the training dataset with 10-fold crossvalidation
train_dataset = make_dataset(X_data,y_data,10)
# Iterate over the batches of the dataset.
for x_train, y_train, x_val, y_val in train_dataset:
train_batches = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_batches = train_batches.batch(batch_sz)
for x_batch_train, y_batch_train in train_batches:
with tf.GradientTape() as tape:
# calculate the forward run
logits = tf_model(x_batch_train)
# assert if output and true label tensor shapes are equal
get_shape = y_batch_train.shape
tf.debugging.assert_shapes([
(logits,get_shape),
], data=(y_batch_train, logits),
summarize=3, message="Inconsistent shape (labels,output): ",
name="assert_shapes")
# calculate loss function
loss_value = loss_fn(y_batch_train, logits)
# add 1 step to the stpes variable
ckpt.step.assign_add(1)
# Add extra losses created during this forward pass:
loss_value += sum(tf_model.losses)
# calculate gradients
grads = tape.gradient(loss_value, tf_model.trainable_weights)
# backpropagate the gradients
optimizer.apply_gradients(zip(grads, tf_model.trainable_weights))
# Update training metric.
train_acc_metric(y_batch_train, logits)
# Save & log every 500 batches.
if int(ckpt.step) % 100 == 0:
save_path = manager.save()
print("Saved checkpoint for step {}: {}".format(int(ckpt.step), save_path))
print("loss {:1.2f}".format(loss_value))
print('Seen so far: %s samples' % (int(ckpt.step) * batch_sz))
# Run a cross-validation loop on each 10-fold dataset
val_logits = tf_model(x_val)
# Update val metrics
val_acc_metric(y_val, val_logits)
# Display metrics at the end of each epoch.
train_acc = train_acc_metric.result()
print('Training accuracy: ', float(train_acc))
# Reset training metrics at the end of each epoch
train_acc_metric.reset_states()
print("----------")
val_acc = val_acc_metric.result()
print('Validation accuracy: ', float(val_acc))
print("----------------------------------------------\n")
val_acc_metric.reset_states()
# Early stopping part
if val_acc < last_val_acc:
# If the max_patience is exceeded stop the training
if patience >= max_patience:
print("\n------------------------------------------------")
print("Early stopping training to prevent over-fitting!")
print("------------------------------------------------\n")
break
else:
patience += 1
# update the validation accuracy
last_val_acc = val_acc
# save the trained model
tf_model.save("./saved_model/tf_model")
print("\n------------------------------------------------")
print("\nEnd of Training!\n")
以及训练的结果:
----------------------------------------------
Successfully created the directory ./saved_model
----------------------------------------------
Starting run script...
Model will be saved to ./saved_model
Checkpoints will be restored from and saved to . f_ckpts
INFO:tensorflow:Assets written to: ./saved_model/tf_model/assets
----------------------------------------------
Initializing from scratch.
----------------------------------------------
Start of epoch 0
Training accuracy: 0.38999998569488525
----------
Validation accuracy: 0.38999998569488525
----------------------------------------------
----------------------------------------------
Start of epoch 1
Saved checkpoint for step 100: ./tf_ckpts/ckpt-1
loss 1.05
Seen so far: 2000 samples
Training accuracy: 0.4050000011920929
----------
Validation accuracy: 0.4050000011920929
----------------------------------------------
----------------------------------------------
Start of epoch 2
Saved checkpoint for step 200: ./tf_ckpts/ckpt-2
loss 1.10
Seen so far: 4000 samples
Training accuracy: 0.36000001430511475
----------
Validation accuracy: 0.36000001430511475
----------------------------------------------
----------------------------------------------
Start of epoch 3
Saved checkpoint for step 300: ./tf_ckpts/ckpt-3
loss 1.15
Seen so far: 6000 samples
Training accuracy: 0.375
----------
Validation accuracy: 0.375
----------------------------------------------
----------------------------------------------
Start of epoch 4
Saved checkpoint for step 400: ./tf_ckpts/ckpt-4
loss 1.17
Seen so far: 8000 samples
Training accuracy: 0.38999998569488525
----------
Validation accuracy: 0.38999998569488525
----------------------------------------------
----------------------------------------------
Start of epoch 5
Saved checkpoint for step 500: ./tf_ckpts/ckpt-5
loss 1.18
Seen so far: 10000 samples
Training accuracy: 0.3799999952316284
----------
Validation accuracy: 0.3799999952316284
----------------------------------------------
----------------------------------------------
Start of epoch 6
Saved checkpoint for step 600: ./tf_ckpts/ckpt-6
loss 1.09
Seen so far: 12000 samples
Training accuracy: 0.35499998927116394
----------
Validation accuracy: 0.35499998927116394
----------------------------------------------
----------------------------------------------
Start of epoch 7
Saved checkpoint for step 700: ./tf_ckpts/ckpt-7
loss 1.12
Seen so far: 14000 samples
Training accuracy: 0.3700000047683716
----------
Validation accuracy: 0.3700000047683716
----------------------------------------------
关于如何使其收敛的任何建议?
解决方案
推荐阅读
- c - 这个声明会变成nop吗?
- javascript - 为什么在 useState 挂钩中使用 const 时不显示错误
- javascript - 如何使用 JavaScript 在 HTML 表上显示所有 JSON Fetch API 内容
- java - 当应用程序被杀死时,WorkManager 是否应该已经在工作,或者它是否需要附加一个前台服务?- 安卓工作室
- css - 在子主题中将想象添加到 styles.css
- java - 单向映射——父子实体的复合键
- php - LARAVEL 8 - 当我尝试检查每篇文章的链接标签时,Tinker 得到空值?
- r - 用列中的整数替换单元格中的值
- python - 有没有办法使优先级队列仅按元组中的优先级值排序,而忽略一对中的另一个值?
- java - 如何使用此方法检查给定字符串中的字符是否可以拼写另一个字符串中的单词?