python - 使用 LSTM 预测市场如何使用所有数据为模型收费并预测期货天数
问题描述
我不熟悉机器学习,也不投资股票或加密货币。昨天我和我的主管讨论,他认为使用长短期记忆我们可以预测加密货币的趋势。相反,我认为我们不能用一个硬币的市场过去的趋势来预测市场的趋势。为了解决这个讨论,我得到了一个大数据集并实现了 LSTM 模型来看看会发生什么(参见下面的完整代码)。
我的问题是我不知道如何为算法提供所有可用数据并提前几天查看结果。我在教程中看到了如何将数据拆分为训练和测试,但这不是我真正想要的。我怎么能这样做?
最小且可重现的示例:
from keras.models import Sequential
from keras.layers import LSTM,Dropout,Dense
from sklearn.preprocessing import MinMaxScaler
import math
# As example, let's get stock quote of ETH from 2017 to 2021 from the database yahoo
df = web.DataReader('ETH-USD', 'yahoo', '2017-01-01', '2021-04-01')
# Sequential model it is not straightforward to define models
# that may have multiple different input sources, produce multiple output destinations or models that re-use layers
# So let's just get the column 'Close'
data = df.filter(['Close'])
dataset = data.values
# I define here tha % of data will be train
training_data_len = math.ceil(len(dataset) * .80)
# The math.ceil() method rounds a number UP to the nearest integer, if necessary, and returns the result.
# In ML it is a good practise to scale the data2
scaler = MinMaxScaler(feature_range=(0,1)) # Values from 0 to 1 inclusive.
scaled_data = scaler.fit_transform(dataset)
train_data = scaled_data[0:training_data_len , :]
# Create the training data set
train_data = scaled_data[0:training_data_len , :]
x_train = []
y_train = []
# Split data into X_train and y_train
for i in range(100,len(train_data)):
x_train.append(train_data[i-100:i,0])
y_train.append(train_data[i,0])
x_train, y_train = np.array(x_train),np.array(y_train)
# reshape
# Expected 3 demensions
# Number of samples(rows), number of types(columns) and number of features
x_train = np.reshape(x_train,(x_train.shape[0],x_train.shape[1],1))
x_train.shape
# Build the LSTM model
model = Sequential()
model.add(LSTM(50,return_sequences=True, input_shape = (x_train.shape[1],1)))
model.add(LSTM(50,return_sequences=False))
model.add(Dense(25))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mean_squared_error')
model.fit(x_train,y_train,batch_size=1,epochs=1)
test_data = scaled_data[training_data_len - 100: , :]
x_test = []
y_test = dataset[training_data_len:, :]
for i in range(100,len(test_data)):
x_test.append(test_data[i-100:i,0])
x_test = np.array(x_test)
x_test = np.reshape(x_test,(x_test.shape[0],x_test.shape[1],1))
predictions = model.predict(x_test)
predictions = scaler.inverse_transform(predictions)
# Plot the data
train = data[:training_data_len]
valid = data[training_data_len:]
valid["prediction"] = predictions
# Output
Close prediction
Date
2020-05-24 205.319748 87.644363
2020-05-25 201.902313 87.254120
2020-05-26 208.863434 86.593002
2020-05-27 219.840424 86.674240
2020-05-28 220.675125 88.306488
... ... ...
2021-03-28 1691.355957 1175.725708
2021-03-28 1819.684937 1178.748413
2021-03-29 1846.033691 1197.531372
2021-03-30 1918.362061 1222.752075
2021-03-31 1977.276855 1255.324951
我得到的预测非常糟糕,但我已经看到,如果不是一组 100 个时间集,而使用 60 个时间集,预测会提高很多,这会增加很多。类似的事情我需要探索因为我不知道原因
此外,如果有人注意到代码中的错误或我做错了什么,我感谢任何评论。
解决方案
推荐阅读
- python-3.x - 在 python 中生成 csv 文件 - 使用带有多维列表的列表的 zip
- javascript - Chart.js 的缩放功能
- mysql - 需要获取最后一个 id 的记录
- angular - 如何将 i18n 与 Angular mat-table 一起使用
- ruby-on-rails - ActiveRecord::RecordInvalid(验证失败:Uid 不能为空)Omniauth LinkedIn 设计
- angular - 总和垫检查的数组变量
- angular - 如何测试 Promise then 方法
- ios - 条纹结帐键盘隐藏付款按钮
- kubernetes - 如何通过 istio 将自定义客户端证书用于外部服务?
- typescript - 理解 TypeScript 中的泛型约束扩展