首页 > 解决方案 > Keras LSTM 中的初始状态

问题描述

我一直在努力理解何时在我的 Keras LSTM 模型中重新初始化 hidden_​​state 时stateful=False。我见过的各种教程暗示它在每批开始时重置,但据我所知,它实际上是在一批中的每个样本之间重置。我错了吗?

我编写了以下代码来测试这一点:

from keras.models import Sequential
from keras.layers import Dense, LSTM
import keras.backend as K
import numpy as np
import tensorflow as tf

a = [1, 0, 0]
b = [0, 1, 0]
c = [0, 0, 1]

seq = [a, b, c, b, a]

x = seq[:-1]
y = seq[1:]
window_size = 1

x = np.array(x).reshape((len(x), window_size , 3))
y = np.array(y)

def run_with_batch_size(batch_size=1):
  model = Sequential()
  model.add(LSTM(20, input_shape=(1, 3)))
  model.add(Dense(3, activation='softmax'))
  model.compile(loss='mean_squared_error', optimizer='adam')

  for i in range(500):
    model.fit(x, y,
      batch_size=batch_size,
      epochs=1,
      verbose=0,
      shuffle=False
    )

  print(model.predict(np.array([[a], [b]]), batch_size=batch_size))
  print()
  print(model.predict(np.array([[b], [c]]), batch_size=batch_size))
  print()
  print(model.predict(np.array([[c], [b]]), batch_size=batch_size))


print('-'*30)
run_with_batch_size(1)
print('**')
run_with_batch_size(2)

运行此代码的结果:

------------------------------
# batch_size 1
[[0.01296294 0.9755857  0.01145133]
 [0.48558792 0.02751653 0.4868956 ]]

[[0.48558792 0.02751653 0.4868956 ]
 [0.01358072 0.9738273  0.01259203]]

[[0.01358072 0.9738273  0.01259203]
 [0.48558792 0.02751653 0.4868956 ]]
**
# batch_size 2
# output of batch (a, b)
[[0.0255649  0.94444686 0.02998832]
 [0.47172785 0.05804421 0.47022793]]

# output of batch (b, c)
# notice first output here is the same as the second output from above
[[0.47172785 0.05804421 0.47022793]
 [0.03059724 0.93813574 0.03126698]]

[[0.03059724 0.93813574 0.03126698]
 [0.47172785 0.05804421 0.47022793]]
------------------------------

当我的 batch_size 为 1 时:

当我的 batch_size 为 2 时:

我对这个领域还很陌生,所以我很可能误解了一些东西。是在批次中的每个样本之间而不是在每个批次之间重置初始状态吗?

标签: pythonkeraslstm

解决方案


伟大的测试,你在正确的轨道上。为了直接回答这个问题,在每次前向传递时为批次中的每个样本设置初始状态stateful=False。遵循源代码

def get_initial_state(self, inputs):
  # build an all-zero tensor of shape (samples, output_dim)
  initial_state = K.zeros_like(inputs)  # (samples, timesteps, input_dim)
  initial_state = K.sum(initial_state, axis=(1, 2))  # (samples,)
  initial_state = K.expand_dims(initial_state)  # (samples, 1)
  # ...

这意味着批次中的每个样本都会获得一个干净的初始状态为零。使用这个函数是在调用函数中:

if initial_state is not None:
  pass
elif self.stateful:
  initial_state = self.states
else:
  initial_state = self.get_initial_state(inputs)

因此,如果stateful=False您没有提供任何明确initial_states的 ,代码将为 RNN 创建新的初始状态,包括从 RNN 层继承的 LSTM。现在,由于call负责计算前向传递,每次有一个前向传递,正如您所发现的那样分批计算,您将获得新的初始状态。


推荐阅读