首页 > 解决方案 > PyMC3 中的离散生存函数(自定义似然度)

问题描述

我正在尝试跟随营销教程。本教程使用Frequentist/MLE 方法;我喜欢 PyMC3 并决定使用它。教程的作者使用了一个Survival函数,就是

S(t|churn_rate) = (1-churn_rate)**(t-1) 这与几何分布形成对比,几何分布只是在上面增加了一项:S(t|churn_rate) = churn_rate*(1-churn_rate)**(t-1).

PyMC3 内置了几何分布,所以我的问题不存在。而是找到一种将生存函数写为可能性的方法。

import arviz as az
import pymc3 as pm
import numpy as np
from pipe import traverse
wins =   [1000, 631, 468, 382, 326]
geo = [[idx+1 for i in range(n)] for idx,n in enumerate(wins)]
geo = np.array(list(geo | traverse)) #flattens the array

with pm.Model() as model:    
    beta_alpha = pm.Uniform('beta_alpha', 0.0001, 5)
    beta_beta = pm.Uniform('beta_beta', 0.0001, 5)
    
    churn = pm.Beta('churn',
                   alpha=beta_alpha,
                   beta=beta_beta)
    renewal = pm.Deterministic('renewal', 1-churn)

    def log_likelihood(theta, t):
      return (t-1)*np.log(theta)
    
    lik = pm.Potential('like', log_likelihood(theta=renewal, t=geo))
    trace = pm.sample(chains=4)

不幸的是,采样器已经失控了......

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Sequential sampling (4 chains in 1 job)
NUTS: [churn, beta_beta, beta_alpha]

 100.00% [2000/2000 00:02<00:00 Sampling chain 0, 821 divergences]

 100.00% [2000/2000 00:03<00:00 Sampling chain 1, 562 divergences]

 100.00% [2000/2000 00:02<00:00 Sampling chain 2, 628 divergences]

 100.00% [2000/2000 00:03<00:00 Sampling chain 3, 364 divergences]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 12 seconds.
There were 822 divergences after tuning. Increase `target_accept` or reparameterize.
The acceptance probability does not match the target. It is 0.498263496658037, but should be close to 0.8. Try to increase the number of tuning steps.
There were 1385 divergences after tuning. Increase `target_accept` or reparameterize.
There were 2013 divergences after tuning. Increase `target_accept` or reparameterize.
The acceptance probability does not match the target. It is 0.6553072990104106, but should be close to 0.8. Try to increase the number of tuning steps.
There were 2378 divergences after tuning. Increase `target_accept` or reparameterize.
The estimated number of effective samples is smaller than 200 for some parameters.

我之前只是写了一个likelihood函数,而不是log_likelihood函数,但是采样器对此也不满意。

我的几个怀疑:

  1. 目前尚不清楚是否pm.Potentialpm.DensityDist。SO 社区似乎认为这pm.Potential是一个更好的选择。
  2. 我将一个名为 geo 的数组传递给log_likelihood. 也许它期待一个标量并且不太确定数组的构成......

资料来源:

http://www.brucehardie.com/papers/021/sbg_2006-05-30.pdf

黑盒可能性示例

标签: pythonsurvival-analysispymc3

解决方案


推荐阅读