首页 > 解决方案 > 如何根据二维形状的一系列输入正确预测一个值?

问题描述

我使用的是编码器-解码器架构,编码器和解码器各有 3 层,每个隐藏层有 128 个神经元。输入采用二维形式:第一列包含天数,第二列包含取决于天数的时间序列(形状:(5780, 100, 2))。输出是第一列值中的单个值,表示发生断点的特定日期(形状:(5780, 1, 1))。断点是时间相关值之一,即第二列。

更好的输入图是:

array([[  0.        ,   1.        ],
       [  2.        ,   1.14469799],
       [  4.        ,   1.35245666],
       ...,
       [ 96.        ,   1.80030942],
       [ 98.        ,   1.79964733],
       [100.        ,   1.9898739]])

天在第一列,相应的测量点在第二列。

输出只是一个值,代表断点发生的日期:

array([[1108.]])

问题是,在训练之后,所有不同测试数据的输出几乎完全相同,即所有不同材料的断点都在同一天(小数位的变化可以忽略不计)。我已经尝试过学习率的高低(范围 1e-2 到 1e-5),训练 epoch 的数量(300 到 3000)。我还改变了层数和每层的神经元。

我没有做的是批量规范化或任何类型的规范化,但我已经对具有相同梯度的相同数据进行了一些操作,并且效果非常好。

我在这里使用的架构如下:

nodes = 128
drp = 0.01

# Defining input layers and shapes
input_train = Input(shape = (complete_inputs.shape[1], complete_inputs.shape[2]))
output_train = Input(shape= (kp_targets.shape[1], kp_targets.shape[2]))

# Masking layer
masking_layer = Masking(mask_value=0, input_shape = input_train.shape)(input_train)

# Encoder layer. For simple S2S model, we only need the last state_h and the last state_c.
enc_first_layer = Bidirectional(LSTM(nodes, dropout=drp, return_sequences=True, return_state=True))(masking_layer)
enc_first_layer, enc_fwd_h1, enc_fwd_c1, enc_back_h1, enc_back_c1 = Bidirectional(LSTM(nodes, dropout=drp, return_sequences=True, return_state=True))(enc_first_layer)
enc_stack_h, enc_fwd_h2, enc_fwd_c2, enc_back_h2, enc_back_c2 = Bidirectional(LSTM(nodes, dropout=drp, return_sequences=True, return_state=True))(enc_first_layer)

enc_last_h1 = concatenate([enc_fwd_h1, enc_back_h1])
enc_last_h2 = concatenate([enc_fwd_h2, enc_back_h2])
enc_last_c1 = concatenate([enc_fwd_c1, enc_back_c1])
enc_last_c2 = concatenate([enc_fwd_c2, enc_back_c2])


# RepeatVector layer (using only the last hidden state of encoder)
rv = RepeatVector(output_train.shape[1])(enc_last_h2)

# Stacked decoder layer for alignment score calculation (using the last hidden state of encoder)
dec_stack_h = Bidirectional(LSTM(nodes, dropout=drp, return_state=False, return_sequences=True))(rv, initial_state=[enc_fwd_h1, enc_fwd_c1, enc_back_h1, enc_back_c1])
dec_stack_h = Bidirectional(LSTM(nodes, dropout=drp, return_state=False, return_sequences=True))(dec_stack_h)
dec_stack_h = Bidirectional(LSTM(nodes, dropout=drp, return_state=False, return_sequences=True))(dec_stack_h, initial_state=[enc_fwd_h2, enc_fwd_c2, enc_back_h2, enc_back_c2])


# Attention layer (uses STACKED encoder output and dots it with stacked decoder output)
attention_ = dot([dec_stack_h, enc_stack_h], axes=[2,2])
attention_ = Activation('softmax')(attention_)

# Calculating the context vector
context = dot([attention_, enc_stack_h], axes=[2,1])

# Concat the context vector and stacked hidden states of decoder, and use it as input to the last dense layer
dec_combined_context = concatenate([context, dec_stack_h])


# Output Timedistributed dense layers
out = TimeDistributed(Dense(nodes/2, activation='relu'))(dec_combined_context)
out = TimeDistributed(Dense(output_train.shape[2], activation='linear'))(dec_combined_context)

# Compile model
model_attn = Model(inputs=input_train, outputs=out)
opt = optimizers.Adam(learning_rate=0.004)
model_attn.compile(optimizer=opt, loss=masked_mae)

这里可能出了什么问题?

为了对这个问题有更广泛的看法,我还想到了以下问题:这个模型是不是矫枉过正?是否有另一种机器/深度学习模型更适合用我拥有的数据预测这种输出?

我已经解决这个问题一周了,没有任何改善,所以任何帮助都将不胜感激。

编辑 1:尝试使用 StandardScaler 和更简单的架构进行标准化。到目前为止没有任何改进。以下是在所有可能的组合中实现的带有注释部分的结构。

nodes = 130 # Tried with 10/30/40/80

model_attn = Sequential()
#model_attn.add(Masking(mask_value=0, input_shape = (complete_inputs.shape[1], complete_inputs.shape[2])))

#model_attn.add(Bidirectional(LSTM(nodes, dropout=0.1, return_sequences=True)))
#model_attn.add(Bidirectional(LSTM(nodes, dropout=0.1, return_sequences=True)))
model_attn.add(Bidirectional(LSTM(nodes, dropout=0.1, return_sequences=False)))

model_attn.add(Dense(1))
model_attn.compile(optimizer=optimizers.Adam(0.001), loss = 'MAE')


随着时间的推移损失没有下降:

model_attn.fit(complete_inputs, kp_targets, batch_size=350, epochs=300, shuffle=True, validation_split=0.1, callbacks=[callback])

Epoch 1/300
11/11 [==============================] - 18s 2s/step - loss: 0.7930 - val_loss: 0.3486
Epoch 2/300
11/11 [==============================] - 16s 1s/step - loss: 0.7544 - val_loss: 0.5152
Epoch 3/300
11/11 [==============================] - 16s 1s/step - loss: 0.7406 - val_loss: 0.4794
Epoch 4/300
11/11 [==============================] - 16s 1s/step - loss: 0.7385 - val_loss: 0.5361
Epoch 5/300
11/11 [==============================] - 16s 1s/step - loss: 0.7367 - val_loss: 0.4821
Epoch 6/300
11/11 [==============================] - 16s 1s/step - loss: 0.7350 - val_loss: 0.5518
Epoch 7/300
11/11 [==============================] - 18s 2s/step - loss: 0.7344 - val_loss: 0.5151
Epoch 8/300
11/11 [==============================] - 17s 2s/step - loss: 0.7339 - val_loss: 0.5646
Epoch 9/300
11/11 [==============================] - 16s 1s/step - loss: 0.7380 - val_loss: 0.5277
Epoch 10/300
11/11 [==============================] - 16s 1s/step - loss: 0.7382 - val_loss: 0.4879
Epoch 11/300
11/11 [==============================] - 16s 1s/step - loss: 0.7367 - val_loss: 0.5367
Epoch 12/300
11/11 [==============================] - 16s 1s/step - loss: 0.7382 - val_loss: 0.4910
Epoch 13/300
11/11 [==============================] - 16s 1s/step - loss: 0.7354 - val_loss: 0.5244
Epoch 14/300
11/11 [==============================] - 16s 1s/step - loss: 0.7386 - val_loss: 0.5043
Epoch 15/300
11/11 [==============================] - 16s 1s/step - loss: 0.7329 - val_loss: 0.5421
Epoch 16/300
11/11 [==============================] - 16s 1s/step - loss: 0.7376 - val_loss: 0.5023
Epoch 17/300
11/11 [==============================] - 16s 1s/step - loss: 0.7346 - val_loss: 0.4539
.....
.....

Epoch 27/300
11/11 [==============================] - 15s 1s/step - loss: 0.7388 - val_loss: 0.5649
Epoch 28/300
11/11 [==============================] - 16s 1s/step - loss: 0.7329 - val_loss: 0.6575
Epoch 29/300
11/11 [==============================] - 16s 1s/step - loss: 0.7400 - val_loss: 0.5123
Epoch 30/300
11/11 [==============================] - 16s 1s/step - loss: 0.7336 - val_loss: 0.4965
Epoch 31/300
11/11 [==============================] - 16s 1s/step - loss: 0.7328 - val_loss: 0.5069
Epoch 32/300
11/11 [==============================] - 17s 2s/step - loss: 0.7320 - val_loss: 0.5274
Epoch 33/300
11/11 [==============================] - 17s 2s/step - loss: 0.7302 - val_loss: 0.5968
Epoch 34/300
11/11 [==============================] - 16s 1s/step - loss: 0.7354 - val_loss: 0.6161
....
....
....
Epoch 184/300
11/11 [==============================] - 16s 1s/step - loss: 0.7088 - val_loss: 0.8242
Epoch 185/300
11/11 [==============================] - 16s 1s/step - loss: 0.7034 - val_loss: 0.7799
Epoch 186/300
11/11 [==============================] - 16s 1s/step - loss: 0.7098 - val_loss: 0.8179
Epoch 187/300
11/11 [==============================] - 16s 1s/step - loss: 0.7066 - val_loss: 0.7854
Epoch 188/300
11/11 [==============================] - 16s 1s/step - loss: 0.7142 - val_loss: 0.8340
Epoch 189/300
11/11 [==============================] - 16s 1s/step - loss: 0.7123 - val_loss: 0.7197

两种损失没有特定的增加或减少顺序。我也停止了在损失最小的时期的训练,但没有任何改进。

更新 1: StandardScalar 的应用存在错误。修复它之后,它似乎确实为测试数据集输出了不同的预测!

更新 2:对于这些类型的预测,CNN 也是一个不错的选择。但是,仍然需要对两种架构进行比较。将在这里更新我的发现!

更新 3:对于这些类型的预测,CNN 是比 LSTM 更好的选择,因为数据更多地涉及分类问题。尽管可以使用更多层的 LSTM 和调整超参数,但我的实验表明,对于类似的结果,CNN 的执行速度至少比 LSTM 快 12 倍,而且内存使用量也可能更低。

标签: pythontensorflowkerasdeep-learningprediction

解决方案


推荐阅读