首页 > 解决方案 > 使用时间序列进行预测的多变量和多步 LSTM 模型

问题描述

我有 10 个指标来预测 1 个指标,数据按时间序列排列,并且每月给出。

我想我应该使用 LSTM。

以下是堆叠数据的拆分方式

stacked = hstack((x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,yy))
print ("stacked.shape" , stacked.shape)

# split a multivariate sequence into samples
def split_sequences(sequences, n_steps_in, n_steps_out):
    X, y = list(), list()
    for i in range(len(sequences)):
    # find the end of this pattern
        end_ix = i + n_steps_in
        out_end_ix = end_ix + n_steps_out-1
        # check if we are beyond the dataset
        if out_end_ix > len(sequences):
            break
    # gather input and output parts of the pattern
        seq_x, seq_y = sequences[i:end_ix, :-1], sequences[end_ix-1:out_end_ix, -1]
        X.append(seq_x)
        y.append(seq_y)
    return array(X), array(y)

n_steps_in, n_steps_out = 24 , 12
# covert into input/output
X, yy = split_sequences(stacked, n_steps_in, n_steps_out)
print ("X.shape" , X.shape)
n_datasets,n_steps_in,n_features = X.shape

print ("y.shape" , yy.shape)
n_datasets,n_steps_out = yy.shape

# spliting data
# 30 year train : 5 year valid: left of 4 year for test
split_point = 12*30
split_point2 = 12*5+splitpoint
train_X , train_y = X[:split_point, :] , yy[:split_point, :]
valid_X , valid_y = X[split_point:split_point2, :] , yy[split_point:split_point2, :]
#test_X, test_y = X[split_point2:, :], yy[split_point2:, :]

所以我打算使用 10 个变量(X)的 24 个月数据来预测一个变量(y)

stacked.shape (597, 11)
X.shape (563, 24, 10)
y.shape (563, 12)
train_x shape: (360, 24, 10)
train_y shape: (360, 12)
valid_x shape: (0, 24, 10)
valid_y shape: (0, 12)

然后我使用带有亚当优化器的 LSTM 训练模型

np.random.seed(42)
tf.random.set_seed(42)

#optimizer learning rate
opt = keras.optimizers.Adam(learning_rate=0.001)
# define model
model = Sequential()
model.add(LSTM(50, activation='relu', return_sequences=True, input_shape=(n_steps_in, n_features)))
model.add(LSTM(50, activation='relu'))
model.add(Dense(n_steps_out))
model.compile(loss='mse' , optimizer=opt , metrics=['mse'])
print(model.summary())

# Fit network
history = model.fit(train_X , train_y , 
                     epochs=40 , verbose=0 ,
                     validation_data=(valid_X, valid_y) ,shuffle=False)

然后我在一个函数中运行预测:

def prep_data(x_test, y_test , start , end , last):
    #prepare test data X
    dataset_test_X = x_test[start:end, :]
    print("dataset_test_X :",dataset_test_X.shape)
    test_X_new = dataset_test_X.reshape(1,dataset_test_X.shape[0],dataset_test_X.shape[1])
    print("test_X_new :",test_X_new.shape)
#prepare past and groundtruth
    past_data = y_test[:end, :]
    dataset_test_y = y_test[end-1:last-1 , :]
    scaler1 = MinMaxScaler(feature_range=(0, 1))
    scaler1.fit(dataset_test_y)
    print("dataset_test_y :",dataset_test_y.shape)
    print("past_data :",past_data.shape)
#predictions
    y_pred = model.predict(test_X_new)
    y_pred_inv = scaler1.inverse_transform(y_pred)
    y_pred_inv = y_pred_inv.reshape(n_steps_out,1)
    y_pred_inv = y_pred_inv[:,0]
    print("y_pred :",y_pred.shape)
    print("y_pred_inv :",y_pred_inv.shape)

    return y_pred_inv , dataset_test_y , past_data

#start can be any point in the test data 4year, 0-24
start =20
end = start + n_steps_in
last = end + n_steps_out
stacked_x = hstack((x1,x2,x3,x4,x5,x6,x7,x8,x9,x10))
stacked_x = stacked_x[split_point2:, :]
print('stacked_x shape: ',stacked_x.shape)
y_test = y[split_point2:, :]
print('y test shape', y_test.shape)
y_pred_inv , dataset_test_y , past_data = prep_data(stacked_x , y_test , start , end , last)

我可以成功地训练模型并使用它,但输出不是很好。它通常只是一条几乎没有转弯的水平线。就算训练结果不好,也不应该至少从past_data结束的地方开始吧?我在这里做错了什么?

输出

标签: pythontensorflowmachine-learningkeraslstm

解决方案


推荐阅读