首页 > 解决方案 > Python 中的因果影响分析 - P 值似乎不正确

问题描述

我正在用 Python 进行因果影响分析,与对照组(A/B 测试)相比,这有助于衡量干预后治疗组的影响。为了开始使用 Python,我参考了https://github.com/jamalsenouci/causalimpact/blob/master/GettingStarted.ipynb

假设我的数据格式如下:

在此处输入图像描述

将 Period_1 视为治疗,将 Period_2 视为控制

以下代码完美运行:

from causalimpact import CausalImpact
pre_period = [pd.to_datetime(date) for date in  [start_date,cut_date_1]]
post_period = [pd.to_datetime(date) for date in [cut_date_2,end_date]]
impact = CausalImpact(df_AA.loc[start_date:end_date_AA], pre_period, post_period, model_args={"nseasons":7})
impact.run()
impact.plot()

我得到低于 2 个图表,并且由于预测值的置信区间与实际值重叠,因此运动似乎没有统计学意义

在此处输入图像描述

但是,我想最终回答运动是否具有统计显着性以及治疗和控制之间的 p 值是多少?为此我使用

print(impact.summary())
print(impact.summary("report"))

我得到的结果如下。它说 p 值为 0.0 并且有 stat sig 积极的运动。这似乎不正确。我尝试了不同的数据,其中实际和预测的差异非常高,并且它们不是预测的 CI 与实际不重叠,我仍然得到 p 值为 0。似乎计算的 p 值不正确。是否有任何指针可以为这个因果影响库自行计算 p 值,或者是否有办法修复这个库?

                              Average     Cumulative
Actual                             15            247
Predicted                          15            246
95% CI                       [15, 15]     [244, 249]
                                                    
Absolute Effect                     0              1
95% CI                         [0, 0]        [3, -1]
                                                    
Relative Effect                  0.4%           0.4%
95% CI                  [1.5%, -0.6%]  [1.5%, -0.6%]
                                                    
P-value                          0.0%               
Prob. of Causal Effect         100.0%               
None
 During the post-intervention period, the response variable had an average value of approx. 15.  By contrast, in  the
absence of an intervention, we would have expected an average response of 15. The 90% interval of this counterfactual
prediction is [15, 15]. Subtracting this prediction from the observed response yields an estimate of the causal effect
the intervention had on the response variable. This effect is 0 with a 90% interval of [0, 0]. For a discussion of the
significance of this effect, see below.


 Summing up the individual data points during the post-intervention period (which can only sometimes be meaningfully
interpreted), the response variable had an overall value of 247.  By contrast, had  the intervention not taken place, we
would have expected a sum of 247. The 90% interval of this prediction is [244, 249]


 The above results are given in terms of absolute numbers. In relative terms, the response variable showed  an increase
of  0.4%. The 90% interval of this percentage is [1.5%, -0.6%]


 This means that the positive effect observed during the intervention period is statistically significant and unlikely
to be due to random fluctuations. It should be noted, however, that the question of whether this increase also bears
substantive significance can only be answered by comparing the absolute effect 0 to the original goal of the underlying
intervention.
None

标签: pythoncausality

解决方案


推荐阅读