time-series - 我可以计算包含某个值的 Prophet 模型的置信区间吗?
问题描述
我可以使用预测数据帧中的 y-hat 方差、边界和点估计来计算包含给定值的置信水平吗?
我已经看到我可以在拟合之前更改间隔级别,但以编程方式感觉就像是大量昂贵的试验和错误。有没有办法仅使用来自模型参数和预测数据框的信息来估计置信区间?
就像是:
for level in [.05, .1, .15, ... , .95]:
if value_in_question in (yhat - Z_{level}*yhat_variance/N, yhat + Z_{level}*yhat_variance/N):
print 'im in the bound level {level}'
# This is sudo code not meant to run in console
编辑:工作先知示例
# csv from fbprohets working examples https://github.com/facebook/prophet/blob/master/examples/example_wp_log_peyton_manning.csv
import pandas as pd
from fbprophet import Prophet
import os
df = pd.read_csv('example_wp_log_peyton_manning.csv')
m = Prophet()
m.fit(df)
future = m.make_future_dataframe(periods=30)
forecast = m.predict(future)
# the smallest confidence level s.t. the confidence interval of the 30th prediction contains 9
## My current approach
def __probability_calculation(estimate, forecast, j = 30):
sd_residuals = (forecast.yhat_lower[j] - forecast.yhat[j])/(-1.28)
for alpha in np.arange(.5, .95, .01):
z_val = st.norm.ppf(alpha)
if (forecast.yhat[j]-z_val*sd_residuals < estimate < forecast.yhat[j]+z_val*sd_residuals):
return alpha
prob = __probability_calculation(9, forecast)
解决方案
fbprophet 使用 numpy.percentile 方法来估计百分位数,您可以在源代码中看到: https ://github.com/facebook/prophet/blob/0616bfb5daa6888e9665bba1f95d9d67e91fed66/python/prophet/forecaster.py#L1448
如何反向计算值的百分位数已在此处得到解答: 将每个列表值映射到其相应的百分位数
根据您的代码示例组合所有内容:
import pandas as pd
import numpy as np
import scipy.stats as st
from fbprophet import Prophet
url = 'https://raw.githubusercontent.com/facebook/prophet/master/examples/example_wp_log_peyton_manning.csv'
df = pd.read_csv(url)
# put the amount of uncertainty samples in a variable so we can use it later.
uncertainty_samples = 1000 # 1000 is the default
m = Prophet(uncertainty_samples=uncertainty_samples)
m.fit(df)
future = m.make_future_dataframe(periods=30)
# You need to replicate some of the preparation steps which are part of the predict() call internals
tmpdf = m.setup_dataframe(future)
tmpdf['trend'] = m.predict_trend(tmpdf)
sim_values = m.sample_posterior_predictive(tmpdf)
sim_values 对象包含每个数据点的 1000 个模拟,置信区间所基于的模拟。
现在您可以使用任何目标值调用 scipy.stats.percentileofscore 方法
target_value = 8
st.percentileofscore(sim_values['yhat'], target_value, 'weak') / uncertainty_samples
# returns 44.26
为了证明这向后和向前有效,您可以获取该np.percentile
方法的输出并将其放入scipy.stats.percentileofscore method
. 这适用于 4 位小数的精度:
ACCURACY = 4
for test_percentile in np.arange(0, 100, 0.5):
target_value = np.percentile(sim_values['yhat'], test_percentile)
if not np.round(st.percentileofscore(sim_values['yhat'], target_value, 'weak') / uncertainty_samples, ACCURACY) == np.round(test_percentile, ACCURACY):
print(test_percentile)
raise ValueError('This doesnt work')
推荐阅读
- ios - crashlytics 中应用程序的不同包标识符
- visual-studio - 为什么我在运行代码时看不到任何图形?
- google-apps-script - 如何格式化来自 Google Apps Script 的电子邮件正文中的日期?
- python - Python中的点系统
- node.js - Firebase Functions 记录对象占用大量空间
- pygame - Pygame 如何在图块下显示图像?
- c - Linux get_user_pages 是否保证页面不会被交换?
- postgresql - 我无法更改 postgres 用户密码
- php - 如何正确地交织 html 和 php。我是新手
- django - 如何将搜索结果保存到模型