首页 > 解决方案 > 使用混合的随机和确定性输入编码集成贝叶斯模型

问题描述

问题

我无法弄清楚如何为包含许多确定性输入和一些随机输入的预测模型实现贝叶斯框架。从概念上讲,这个问题似乎很简单,但从编码的角度来看,我不确定如何将此模型指定为 JAGS/BUGS 语言中的一个集成单元(例如,在单个 JAGS/BUGS 模型中描述的整个方程)。

前进的道路
有人告诉我,整个分析应该在单个 JAGS/BUGS 模型的范围内,而不是作为单独的 JAGS 模型,从每个模型的后验预测分布 (PPD) 模拟的 100,000 个数据点输入到主模型中方程。

也许所需要的只是将所有 JAGS 代码复制并粘贴到一个位置并添加更大的方程式?

方程

该等式的基本组成部分是:

结果 = 事件的概率(蓝色)* 暴露率(绿色)* 避免该事件的能力(橙色)

完整的等式是:

在此处输入图像描述

红色变量(G、P Q+R 和避免)是随机变量,P Q+R 是线性回归,有助于预测暴露的年度变化。我有这些变量中的每一个的经验数据,并且需要使用贝叶斯框架(或其他概率方法)来基于确定性和随机输入的组合来模拟这个方程的结果。

初步尝试

虽然我在理论上是贝叶斯方法的粉丝并且正在努力实现这种方法,但迄今为止我的培训更侧重于重采样程序。因此,最合乎逻辑的做法似乎是为每个随机输入生成单独的 JAGS 代码,然后使用 MCMC 链中为每个参数 (s=100,000) 保存的每个步骤插入似然函数和从后验预测分布 (PPD) 生成数据集,作为更大方程的输入。

尽管该模型未集成到单个 Jags 代码单元中,但从数学上看,这似乎是正确的方法,因为 @baruuum 回答了 @Fred L. 的关于理解贝叶斯预测分布的问题,URL (v: 2018-08-19 ): https://stats.stackexchange.com/q/335496并摘录在这里:

现在,我们如何从 p(y~|y) 中获取样本?您描述的方法有时称为组合方法,其工作原理如下:

对于 NewSimulatedData = 1,2,...,s

draw θ(s) from p(θ|y) #这是来自 JAGS 的 PPD 的 MCMC 样本 draw y~(s) from p(y~|θ(s)) #这是从 PPD 中绘制模拟数据集

其中,在大多数情况下,我们已经从 p(θ|y) 中抽取了数据,因此只需要第二步。

附带说明:我还没有在 JAGS 中完成第二步,因为我没有找到如何执行此操作的示例,所以我只是从 p(θ|y) 中取出 100,000 个 θ(s) 并生成 y~( s) 为更大的方程提供输入。

步骤1

我制作了 4 个独立的 JAGS 模型来为每个随机变量生成后验预测分布:

  1. 一个用于估计为变量 G 生成 PPD 所需的参数;
  2. 一种用于估计为“避免”生成 PPD 所需的参数;
  3. 一种用于估计线性关系的P(斜率)、R(截距)参数并生成Q(X值)以生成Y的期望值;

这些 JAGS 模型中的每一个都具有以下特性:

#JAGS ITERATIONS
adaptSteps = 5000 #Number of steps to "tune" the samplers
burnInSteps = 50000 #Number of steps to "burn-in" the samplers
nChains = 4 #Number of steps to "burn-in" the samplers
thinSteps = 5 #Number of steps to "thin" (1=keep every step)
numSavedSteps = 100000 #Total number of steps in chains to save
nPerChain = ceiling( ( numSavedSteps * thinSteps ) / nChains ) #Steps per chain

这是 JAGS 模型和实现它的代码:

  1. G:
 #JAGS MODEL CODE FOR G
modelString = "
model {

#Likelihood function: Log-normal
#Note that JAGS uses precision as scale parameter for log-normal distributions rather than variance (which is the inverse of precision) 
    for( i in 1 : N ) {
      y[i] ~ dlnorm( muOfLogY , 1/sigmaOfLogY^2 ) 
    }

#Prior for mu of log(y): Normal distribution with a location parameter matching mean of log(y) of flight speed with a large spread about the location 
  muOfLogY ~ dnorm( meanOfLogY , 1/(10*sdOfLogY)^2 ) 

#Prior for sigma of log(y): Uniform distribution with a wide range that included all probable values of variance for flight speed 
  sigmaOfLogY ~ dunif( 0.001*sdOfLogY , 1000*sdOfLogY )  

#Prior for Mu: Not essential for calculations of collision risk but tracked so that parameter distributions could be viewed on a non-logarithmic scale (more intuitive)
  muOfY <- exp( muOfLogY+sigmaOfLogY^2/2 )

#Prior for Mode: Not essential for calculations of collision risk but tracked so that parameter distributions could be viewed on a non-logarithmic scale (more intuitive)
  modeOfY <- exp( muOfLogY-sigmaOfLogY^2 )

#Prior for Sigma: Not essential for calculations of collision risk but tracked so that parameter distributions could be viewed on a non-logarithmic scale (more intuitive)
  sigmaOfY <- sqrt( exp(2*muOfLogY+sigmaOfLogY^2)*(exp(sigmaOfLogY^2)-1) )
#-----------------
#References:
#Code adapted from Krushcke (2015): Doing Bayesian Data Analysis-A Tutorial with R, JAGS, and Stan (Jags-Ymet-Xnom1grp-MlogNormal-Script.R) 
}
" # close quote for modelString
writeLines( modelString , con="JAGS_C1.txt" )

#PARAMETERS TO TRACK
parameters = c("muOfLogY" , "sigmaOfLogY" , "muOfY" , "modeOfY" , "sigmaOfY" ) 

# CREATE, INITIALIZE, AND ADAPT THE MODEL:
jagsModel = jags.model( "JAGS_C1.txt" , data=dataList ,
                        n.chains=nChains , n.adapt=adaptSteps )
# BURN IN THE MCMC CHAIN:
cat( "Burning in the MCMC chain...\n" )
update( jagsModel , n.iter=burnInSteps )

# SAVE MCMC CHAIN:
cat( "Sampling final MCMC chain...\n" )
mcmcCoda1 = coda.samples( jagsModel , variable.names=parameters ,
                            n.iter=nPerChain , thin=thinSteps )
  1. 避免:
#JAGS MODEL FOR AVOID
#-----------------
modelString = "
model {

#Likelihood function: Weibull
#Note that the parameterization of this distribution by JAGS and R(dweibul) use different notation/parameters
#Shape parameter: JAGS is 'nu', R(dweibul) is 'a'
#Scale parameter: JAGS is 'lambda', R(dweibul) is 'b'
  for ( i in 1:N ) {
    y[i] ~ dweib( nu , lambda )  #This is the JAGS parameterization of Weibull distribution
   }

#Prior for nu: Uniform distribution that included all probable values of the shape parameter
  nu <- a   #Renaming parameter to match R language rather than JAGS language (more intuitive)
  a ~ dunif( 0,200 )

#Prior for lambda: Uniform distribution that included all probable values of the scale parameter
  lambda <- 1/b^a #Put parameter in terms of R language rather than JAGS language (more intuitive)
  b ~ dunif(0,200)  

#-----------------
#References:
#Code adapted from Krushcke (2015): Doing Bayesian Data Analysis-A Tutorial with R, JAGS, and Stan (Jags-YmetCensored-Xnom2grp-Mweibull.R) 
}
" # close quote for modelString
writeLines( modelString , con="JAGS_C2.txt" )


#PARAMETERS TO TRACK
parameters = c("nu","a","b")
nPerChain = ceiling( ( numSavedSteps * thinSteps ) / nChains ) # Steps per chain.

# CREATE, INITIALIZE, AND ADAPT THE MODEL:
jagsModel = jags.model( "JAGS_C2.txt" , data=dataList ,
                        n.chains=nChains , n.adapt=adaptSteps )
# BURN IN THE MCMC CHAIN:
cat( "Burning in the MCMC chain...\n" )
update( jagsModel , n.iter=burnInSteps )

# SAVE MCMC CHAIN:
cat( "Sampling final MCMC chain...\n" )
mcmcCoda2 = coda.samples( jagsModel, variable.names=parameters ,
                            n.iter=nPerChain, thin=thinSteps,                                                          
                            method="parallel" )
  1. P、Q、R 和派生的 Y:为简单起见,我们将跳过 #3 的 JAGS 代码,因为它可能不需要使它成为可重现的示例。它是 John Kruschke 提供的代码的改编版本,可在此处获取 http://doingbayesiandataanalysis.blogspot.com/2015/10/postterior-predicted-distribution-for.html

第2步

每个 JAGS 模型的结果保存为“mcmcCODA”,其中包含确定后预测分布的形状/比例/位置的 100,000 个步骤(或样​​本)参数。我使用这些从 PPD 生成新数据,作为更大方程的输入。

出于可重现示例的目的

假设方程被大大简化了,只有随机变量 1 和 2 以及一个额外的确定性变量,所以方程将类似于:

结果 = StochasticVar1 * (1-StochasticVar2) * DeterministicVar1

最好的方法如下?

#INTEGRATED JAGS MODEL
#-----------------
modelString = "
model {
### Main equation ###
outcome <- SV1 * (1 - SV2) * DV1

### Likelihoods ###

#Likelihood function for G: Log-normal
#Note that JAGS uses precision as scale parameter for log-normal distributions rather than variance (which is the inverse of precision) 
    for( i in 1 : N ) {
      y[i] ~ dlnorm( muOfLogY , 1/sigmaOfLogY^2 ) 
    }


#Likelihood function for "avoid": Weibull
#Note that the parameterization of this distribution by JAGS and R(dweibul) use different notation/parameters
#Shape parameter: JAGS is 'nu', R(dweibul) is 'a'
#Scale parameter: JAGS is 'lambda', R(dweibul) is 'b'
  for ( i in 1:N ) {
    y[i] ~ dweib( nu , lambda )  #This is the JAGS parameterization of Weibull distribution
   }

### Priors ###

#All the priors from above JAGS code 

迄今为止我找到的最复杂的 JAGS/BUGS 模型来自 Pardo 等人 (2015) https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0120727及其补充代码https:/ /doi.org/10.1371/journal.pone.0120727.s002我在这里转载:

##### Short-beaked common dolphins: #####
  ##### PRIORS: #####
  a0 ~ dunif(-0.2,0.6)        
  a1 ~ dunif(-0.3,-0.02)       
  a2 ~ dunif(0.05,0.42)         
  sd.eps.a ~ dunif(0.15,0.65)   
  sd.eps.a2 ~ dunif(0.05,0.42)  
  logmu.gs ~ dunif(3.5,5.5)     
  logsd.gs ~ dunif(8.5,13.5)    
  nu0 ~ dunif(-3.6,-2.1)         
  nu1 ~ dunif(-14,-7.5)             
  nu2 ~ dunif(-45,-18)           
  sd.eps.nu ~ dunif(1.12,1.95)   
  #### OBSERVATION MODELS ####
  ## Models from sightings (j): ##
  logvar.gs <- log(1+pow(logsd.gs/logmu.gs,2))  
  logtau.gs <- 1/logvar.gs   
  eps.a.tau <- 1/(sd.eps.a*sd.eps.a) 
  for (j in 1:1723){
    perp.dist.sigh[j] ~ dnorm(0,perp.dist.tau[j]) # Eq. 1
    perp.dist.tau[j] <- 1/(pow(w.sigh[j],2)*(2/pi)) # Eq. 2
    gp.sz.sigh[j] ~ dlnorm(logmu.gs,logtau.gs) # Eq. 4
    eps.a[j] ~ dnorm(0,eps.a.tau)                   
    w.sigh[j] <- exp(a0+(a1*beauf.sigh[j])+(a2*log(gp.sz.sigh[j]))+eps.a[j]) # Eq. 6
  }
  mean.gs <- exp(logmu.gs+0.5*logvar.gs) # Eq. 7
  #### g(0): ####
  g0.barlow ~ dbeta(102.8362,3.180502) # Eqs. 13 to 16
  #### Models from cells (i): ####
  eps.a2.tau <- 1/(sd.eps.a2*sd.eps.a2)
  eps.nu.tau <- 1/(sd.eps.nu*sd.eps.nu)
  for (i in 1:11173)
    #### Effective strip half-width model for cells: ####
    eps.a2[i] ~ dnorm(0,eps.a2.tau) 
    w.cell[i] <-exp(a0+(a1*beauf.cell[i])+(a2*log(mean.gs))+eps.a2[i]) # Eq. 9
    ## Predicted group counts: ##
    n.groups.cell[i] ~ dpois(pred.gp[i]) # Eq. 10
    pred.gp[i] <- (2*w.cell[i]*eff.cell[i]*dens.cell[i]*g0.barlow)/mean.gs # Eq. 12
    #### Check the group counts likelihood: ####
    squared.res.obs.gp.counts[i] <- pow(n.groups.cell[i]-pred.gp[i],2)                   
    new.n.groups.cell[i] ~ dpois(pred.gp[i])                                              
    new.squared.res.gp.counts[i] <- pow(new.n.groups.cell[i] - pred.gp[i], 2)
    ###################################### ECOLOGICAL MODEL: ######################################
    eps.nu[i] ~ dnorm(0,eps.nu.tau)
    dens.cell[i] <- exp(nu0+(nu1*ssh.cells[i])+(nu2*pow(ssh.cells[i],2))+eps.nu[i]) # Eq. 17
  }
  #### POSTERIOR PREDICTIVE CHECK: ####
  fit.obs.counts <- sum(squared.res.obs.gp.counts[]) 
  fit.new.counts <- sum(new.squared.res.gp.counts[])
  test.fit.counts <- step(fit.new.counts-fit.obs.counts)
  b.p.value.counts <- mean(test.fit.counts) # Bayesian p-value 

重点

我觉得我错过了一些基本的东西。任何有助于创建单一集成模型的建议、资源或专业知识,或在特定情况下对最合适方法的任何想法,将不胜感激。

先感谢您!

标签: rpredictionbayesianjagsstochastic

解决方案


推荐阅读