首页 > 解决方案 > 使用 LSTM 预测市场如何使用所有数据为模型收费并预测期货天数

问题描述

我不熟悉机器学习,也不投资股票或加密货币。昨天我和我的主管讨论,他认为使用长短期记忆我们可以预测加密货币的趋势。相反,我认为我们不能用一个硬币的市场过去的趋势来预测市场的趋势。为了解决这个讨论,我得到了一个大数据集并实现了 LSTM 模型来看看会发生什么(参见下面的完整代码)。

我的问题是我不知道如何为算法提供所有可用数据并提前几天查看结果。我在教程中看到了如何将数据拆分为训练和测试,但这不是我真正想要的。我怎么能这样做?

最小且可重现的示例:


from keras.models import Sequential
from keras.layers import LSTM,Dropout,Dense
from sklearn.preprocessing import MinMaxScaler
import math

# As example, let's get stock quote of ETH from 2017 to 2021 from the database yahoo
df = web.DataReader('ETH-USD', 'yahoo', '2017-01-01', '2021-04-01')

# Sequential model it is not straightforward to define models 
# that may have multiple different input sources, produce multiple output destinations or models that re-use layers
# So let's just get the column 'Close'
data = df.filter(['Close'])
dataset = data.values

# I define here tha % of data will be train 
training_data_len = math.ceil(len(dataset) * .80)
# The math.ceil() method rounds a number UP to the nearest integer, if necessary, and returns the result.


# In ML it is a good practise to scale the data2
scaler = MinMaxScaler(feature_range=(0,1)) # Values from 0 to 1 inclusive.
scaled_data = scaler.fit_transform(dataset)

train_data = scaled_data[0:training_data_len , :]


# Create the training data set
train_data = scaled_data[0:training_data_len , :]

x_train = []
y_train = []
# Split data into X_train and y_train
for i in range(100,len(train_data)):
    x_train.append(train_data[i-100:i,0])
    y_train.append(train_data[i,0])

x_train, y_train = np.array(x_train),np.array(y_train)


# reshape
# Expected 3 demensions 
# Number of samples(rows), number of types(columns) and number of features
x_train = np.reshape(x_train,(x_train.shape[0],x_train.shape[1],1))
x_train.shape


# Build the LSTM model
model = Sequential()
model.add(LSTM(50,return_sequences=True, input_shape = (x_train.shape[1],1)))
model.add(LSTM(50,return_sequences=False))
model.add(Dense(25))
model.add(Dense(1))

model.compile(optimizer='adam', loss='mean_squared_error')

model.fit(x_train,y_train,batch_size=1,epochs=1)


test_data = scaled_data[training_data_len - 100: , :]
x_test = []
y_test = dataset[training_data_len:, :]

for i in range(100,len(test_data)):
    
    x_test.append(test_data[i-100:i,0])
  

x_test = np.array(x_test)
x_test = np.reshape(x_test,(x_test.shape[0],x_test.shape[1],1))


predictions  = model.predict(x_test)
predictions = scaler.inverse_transform(predictions) 

# Plot the data
train = data[:training_data_len]
valid = data[training_data_len:]
valid["prediction"] = predictions

# Output

                Close   prediction
Date                                
2020-05-24   205.319748    87.644363
2020-05-25   201.902313    87.254120
2020-05-26   208.863434    86.593002
2020-05-27   219.840424    86.674240
2020-05-28   220.675125    88.306488
...                 ...          ...
2021-03-28  1691.355957  1175.725708
2021-03-28  1819.684937  1178.748413
2021-03-29  1846.033691  1197.531372
2021-03-30  1918.362061  1222.752075
2021-03-31  1977.276855  1255.324951

我得到的预测非常糟糕,但我已经看到,如果不是一组 100 个时间集,而使用 60 个时间集,预测会提高很多,这会增加很多。类似的事情我需要探索因为我不知道原因

此外,如果有人注意到代码中的错误或我做错了什么,我感谢任何评论。

标签: pythontensorflowmachine-learningkeraslstm

解决方案


推荐阅读