python - 如何自动化 SARIMA 模型进行时间序列预测?
问题描述
我正在尝试使用 SARIMA 在时间序列预测中为 p、d、q 找到正确的参数。我需要预测 1000 个邮政编码的房价。问题是网格搜索需要太多时间,我无法手动查看每个邮政编码的 ACF/PACF,因为我需要将其自动化。
我尝试使用网格搜索来搜索 8 种不同的参数组合,并使用了基于 AIC 的最佳参数集。
p = d = q = range(0, 2)
#d = range(0, 2)
pdq = list(itertools.product(p, d, q))
seasonal_pdq = [(x[0], x[1], x[2], 12) for x in list(itertools.product(p, d, q))]
parameters = []
for param in pdq:
for param_seasonal in seasonal_pdq:
try:
model = sm.tsa.statespace.SARIMAX(y_new,method='css',
order=param,
seasonal_order=param_seasonal,
enforce_stationarity=False,
enforce_invertibility=False)
results = model.fit()
#print('ARIMA{}x{}12 - AIC:{}'.format(param, param_seasonal, results.aic))
except:
continue
aic = results.aic
parameters.append([param,param_seasonal,aic])
result_table = pd.DataFrame(parameters)
result_table.columns = ['parameters','parameters_seasonal','aic']
# sorting in ascending order, the lower AIC is - the better
result_table = result_table.sort_values(by='aic', ascending=True).reset_index(drop=True)
我无法得到一个可以超越天真的预测的模型。你能给我一些关于如何进行的指导吗?
解决方案
最好的办法是使用金字塔库,它可以自动选择 p、d、q 参数。您需要充分处理数据以便输入 1000 个时间序列,但这里有一个如何在单个时间序列上运行的示例。
假设我们有一个随时间变化的每日最高温度记录数据集,目标是自动选择 ARIMA 的 p、d、q 参数。这可以通过以下方式实现:
from pyramid.arima.stationarity import ADFTest
adf_test = ADFTest(alpha=0.05)
adf_test.is_stationary(series)
train, test = series[1:741], series[742:927]
train.shape
test.shape
plt.plot(train)
plt.plot(test)
plt.title("Training and Test Data")
plt.show()
如您所见,在这种情况下,ARIMA 模型选择本身是基于具有最低 AIC 的配置:
>>> Arima_model=auto_arima(train, start_p=1, start_q=1, max_p=8, max_q=8, start_P=0, start_Q=0, max_P=8, max_Q=8, m=12, seasonal=True, trace=True, d=1, D=1, error_action='warn', suppress_warnings=True, random_state = 20, n_fits=30)
Fit ARIMA: order=(1, 1, 1) seasonal_order=(0, 1, 0, 12); AIC=-667.202, BIC=-648.847, Fit time=3.710 seconds
Fit ARIMA: order=(0, 1, 0) seasonal_order=(0, 1, 0, 12); AIC=-270.700, BIC=-261.522, Fit time=0.354 seconds
Fit ARIMA: order=(1, 1, 0) seasonal_order=(1, 1, 0, 12); AIC=-625.446, BIC=-607.090, Fit time=2.365 seconds
Fit ARIMA: order=(0, 1, 1) seasonal_order=(0, 1, 1, 12); AIC=-1090.370, BIC=-1072.014, Fit time=7.584 seconds
Fit ARIMA: order=(0, 1, 1) seasonal_order=(1, 1, 1, 12); AIC=-1088.657, BIC=-1065.712, Fit time=10.024 seconds
Fit ARIMA: order=(0, 1, 1) seasonal_order=(0, 1, 0, 12); AIC=-653.939, BIC=-640.172, Fit time=1.733 seconds
Fit ARIMA: order=(0, 1, 1) seasonal_order=(0, 1, 2, 12); AIC=-1087.889, BIC=-1064.944, Fit time=25.853 seconds
Fit ARIMA: order=(0, 1, 1) seasonal_order=(1, 1, 2, 12); AIC=-1087.188, BIC=-1059.655, Fit time=31.205 seconds
Fit ARIMA: order=(1, 1, 1) seasonal_order=(0, 1, 1, 12); AIC=-1105.233, BIC=-1082.288, Fit time=10.266 seconds
Fit ARIMA: order=(1, 1, 0) seasonal_order=(0, 1, 1, 12); AIC=-887.349, BIC=-868.994, Fit time=9.558 seconds
Fit ARIMA: order=(1, 1, 2) seasonal_order=(0, 1, 1, 12); AIC=-1086.931, BIC=-1059.397, Fit time=11.649 seconds
Fit ARIMA: order=(0, 1, 0) seasonal_order=(0, 1, 1, 12); AIC=-724.814, BIC=-711.047, Fit time=4.372 seconds
Fit ARIMA: order=(2, 1, 2) seasonal_order=(0, 1, 1, 12); AIC=-1085.480, BIC=-1053.358, Fit time=17.619 seconds
Fit ARIMA: order=(1, 1, 1) seasonal_order=(1, 1, 1, 12); AIC=-1072.933, BIC=-1045.400, Fit time=13.924 seconds
Fit ARIMA: order=(1, 1, 1) seasonal_order=(0, 1, 2, 12); AIC=-1102.926, BIC=-1075.392, Fit time=28.082 seconds
Fit ARIMA: order=(1, 1, 1) seasonal_order=(1, 1, 2, 12); AIC=-1102.342, BIC=-1070.219, Fit time=35.426 seconds
Fit ARIMA: order=(2, 1, 1) seasonal_order=(0, 1, 1, 12); AIC=-1010.837, BIC=-983.303, Fit time=8.926 seconds
Total fit time: 222.656 seconds
>>>
>>> Arima_model.summary()
<class 'statsmodels.iolib.summary.Summary'>
"""
Statespace Model Results
==========================================================================================
Dep. Variable: y No. Observations: 740
Model: SARIMAX(1, 1, 1)x(0, 1, 1, 12) Log Likelihood 557.617
Date: Thu, 14 Mar 2019 AIC -1105.233
Time: 16:33:59 BIC -1082.288
Sample: 0 HQIC -1096.379
- 740
Covariance Type: opg
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
intercept 1.359e-06 6.75e-06 0.201 0.840 -1.19e-05 1.46e-05
ar.L1 0.1558 0.034 4.575 0.000 0.089 0.223
ma.L1 -0.9847 0.013 -75.250 0.000 -1.010 -0.959
ma.S.L12 -0.9933 0.092 -10.837 0.000 -1.173 -0.814
sigma2 0.0118 0.001 11.259 0.000 0.010 0.014
===================================================================================
Ljung-Box (Q): 54.38 Jarque-Bera (JB): 3179.66
Prob(Q): 0.06 Prob(JB): 0.00
Heteroskedasticity (H): 0.77 Skew: -1.46
Prob(H) (two-sided): 0.04 Kurtosis: 12.82
===================================================================================
Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
如果您熟悉 R,还可以使用auto.arima命令。事实上,我会建议这样做,因为在某些情况下,它可能会为您提供比 Pyramid(最近开发的)更好的自动化配置。
也就是说,金字塔将帮助您极大地自动化事情。
推荐阅读
- c++ - 如何在 C++ 中写入 Apache Arrow 羽毛文件?
- arrays - 所以我试图调用数组并通过 If 函数传递登录信息,但它不断出现错误
- javascript - 使用主滚动条滚动页面中的所有元素
- python - 在 elastick beanstalk 中提供静态文件
- javascript - 纯 JS:试图删除删除按钮所在的 Div 标签
- django - 在 Github Actions for Django 中设置 postgres
- java - Mapbox android SDK 错误 - java.lang.ClassNotFoundException:找不到类“com.mapbox.android.telemetry.MapboxTelemetry”
- perl - 如何解决 Bugzilla Triologygmbh CAS 插件文件在 AuthCASSaml.pm 第 413 行不存在?
- c - TensorFlow C API 日志记录设置
- r - 在 R 中创建多个脚本链