python - 准确性对 LSTM 和 cross_val_predict 来说真的很糟糕
问题描述
我正在尝试验证 LSTM 预测的股票市场预测时间序列的分数(链接到数据集https://www.kaggle.com/camnugent/sandp500,我正在使用 AAL 股票)。数据具有以下形状:
open high
0 15.07 15.12
1 14.89 15.01
2 14.45 14.51
3 14.30 14.94
4 14.94 14.96
... ... ...
1254 54.00 54.64
1255 53.49 53.99
1256 51.99 52.39
1257 49.32 51.50
1258 50.91 51.98
1259 rows × 2 columns
在使用model.fit和model.predict的时候,我可以看到结果并不好,但至少表明遵循真实数据。(图像仅显示预测,因此训练是数据集的80%)
现在,当使用 cross_val_predict 或 cross_val_score 时,结果非常糟糕,就像 0.30 最终下降到 0.003。完整的代码是:
import numpy as np
import math
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import cross_val_predict
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Dropout
from tscv import GapKFold
from keras.wrappers.scikit_learn import KerasClassifier
sc = MinMaxScaler()
# define parameters
prevision_days = 5
verbose, epochs, batch_size = 1, 20, 50
size_test = 0.2 #20%
# fix random seed for reproducibility
seed = 7
np.random.seed(seed)
# load the dataset file
original_dataset = pd.read_csv('..\\dataset\\all_stocks_5yr.csv')
original_dataset.loc[original_dataset['low'].isnull(),'low'] = original_dataset['close']
original_dataset.loc[original_dataset['open'].isnull(),'open'] = original_dataset['close']
original_dataset.loc[original_dataset['high'].isnull(),'high'] = original_dataset['close']
dataset = original_dataset[original_dataset.Name == 'AAL'].drop(['date', 'volume', 'Name'], axis=1)
dataset = dataset[['open','high']]
#breaking in train/test
test_size = -1*int(prevision_days * round((math.floor(len(dataset)*size_test))/prevision_days))
dataset_scaled = sc.fit_transform(dataset)
#Preparing the data
data = []
target = []
for i in range(prevision_days, len(dataset_scaled)):
data.append(dataset_scaled[i-prevision_days:i, 0])
target.append(dataset_scaled[i, 0])
data, target = np.array(data), np.array(target)
data = np.reshape(data, (data.shape[0], data.shape[1], 1))
# Function to create model, required for KerasClassifier
def create_model():
# create model
model = Sequential()
model.add(LSTM(units = 50, return_sequences = True, input_shape = (data.shape[1], 1)))
model.add(Dropout(0.2))
model.add(LSTM(units = 50, return_sequences = True))
model.add(Dropout(0.2))
model.add(LSTM(units = 50, return_sequences = True))
model.add(Dropout(0.2))
model.add(LSTM(units = 50))
model.add(Dropout(0.2))
model.add(Dense(units = 1))
model.compile(optimizer = 'adam', loss = 'mean_squared_error', metrics=['accuracy'])
return model
model = KerasClassifier(build_fn=create_model, epochs=epochs, batch_size=batch_size, verbose=verbose)
results = cross_val_predict(model, data, target, cv=5)
print(results)
结果是:
[[0.30032859]
[0.30032859]
[0.30032859]
...
[0.00306681]
[0.00306681]
[0.00306681]]
知道什么可能导致这些结果吗?我已经将 epoch 增加到 50,batch_size 也增加到 50,但是结果完全一样,这也很奇怪。
非常感谢,若奥
解决方案
推荐阅读
- laravel - Laravel 容器升级问题
- vue.js - 为 RelatedPeople 模型解释路线资源类型的问题
- oauth-2.0 - Keycloak 11.0.0 生成令牌范围参数无效
- ios - 如何用内部资源文件替换 WKWebView 响应?
- python - 在 Jupyter Notebook 中使用 ipywidgets 和 asyncio 以交互方式创建列表
- azure-ad-b2c - 无法自定义验证码消息
- flask - SQLAlchemyAutoSchema 嵌套反序列化
- excel - VBA:无法在类中设置结构的变量
- ios - 为 iOS 模拟器构建,但链接框架“****.framework”是为 iOS 构建的
- xcode - XCode - ld:找不到 -lfile_vacuum 的库