python - 如何创建非中心学生的 T 分布以及与分布一起使用的先验？

问题描述

我一直在使用以下链接，使用 Scipy (Python) 将经验分布拟合到理论分布？

我一直在将我的数据用于链接中的代码，并发现我的数据的常见分布是非中心学生的 T 分布。我在 pymc3 包中找不到发行版，所以，我决定用 scipy 看一下，以了解发行版是如何形成的。我创建了一个自定义发行版，但我有几个问题：

我想知道我创建分发的方法是否正确？
如何将自定义分布实现到模型中？
关于先验分布，我是否在正态分布先验（mu 和 sigma）中使用相同的步骤，结合自由度和非中心值的半范数？

我的自定义分布：

import numpy as np
import theano.tensor as tt
from scipy import stats
from scipy.special import hyp1f1, nctdtr
import warnings
from pymc3.theanof import floatX
from pymc3.distributions.dist_math import bound, gammaln
from pymc3.distributions.continuous import assert_negative_support, get_tau_sigma
from pymc3.distributions.distribution import Continuous, draw_values, generate_samples

class NonCentralStudentT(Continuous):
    """
    Parameters
    ----------
    nu: float
        Degrees of freedom, also known as normality parameter (nu > 0).
    mu: float
        Location parameter.
    sigma: float
        Scale parameter (sigma > 0). Converges to the standard deviation as nu increases. (only required if lam is not specified)
    lam: float
        Scale parameter (lam > 0). Converges to the precision as nu increases. (only required if sigma is not specified)
    """

    def __init__(self, nu, nc, mu=0, lam=None, sigma=None, sd=None, *args, **kwargs):
        super().__init__(*args, **kwargs)
        super(NonCentralStudentT, self).__init__(*args, **kwargs)
        if sd is not None:
            sigma = sd
            warnings.warn("sd is deprecated, use sigma instead", DeprecationWarning)
        self.nu = nu = tt.as_tensor_variable(floatX(nu))
        self.nc = nc = tt.as_tensor_variable(floatX(nc))
        lam, sigma = get_tau_sigma(tau=lam, sigma=sigma)
        self.lam = lam = tt.as_tensor_variable(lam)
        self.sigma = self.sd = sigma = tt.as_tensor_variable(sigma)
        self.mean = self.median = self.mode = self.mu = mu = tt.as_tensor_variable(mu)
        self.variance = tt.switch((nu > 2) * 1, (1 / self.lam) * (nu / (nu - 2)), np.inf)

        assert_negative_support(lam, 'lam (sigma)', 'NonCentralStudentT')
        assert_negative_support(nu, 'nu', 'NonCentralStudentT')
        assert_negative_support(nc, 'nc', 'NonCentralStudentT')

    def random(self, point=None, size=None):
        """
        Draw random values from Non-Central Student's T distribution.
        Parameters
        ----------
        point: dict, optional
            Dict of variable values on which random values are to be
            conditioned (uses default point if not specified).
        size: int, optional
            Desired size of random sample (returns one sample if not
            specified).
        Returns
        -------
        array
        """
        nu, nc, mu, lam = draw_values([self.nu, self.nc, self.mu, self.lam], point=point, size=size)
        return generate_samples(stats.nct.rvs, nu, nc, loc=mu, scale=lam ** -0.5, dist_shape=self.shape, size=size)

    def logp(self, value):
        """
        Calculate log-probability of Non-Central Student's T distribution at specified value.
        Parameters
        ----------
        value: numeric
            Value(s) for which log-probability is calculated. If the log probabilities for multiple
            values are desired the values must be provided in a numpy array or theano tensor
        Returns
        -------
        TensorVariable
        """
        nu = self.nu
        nc = self.nc
        mu = self.mu
        lam = self.lam

        n = nu * 1.0
        nc = nc * 1.0
        x2 = value * value
        ncx2 = nc * nc * x2
        fac1 = n + x2
        trm1 = n / 2. * tt.log(n) + gammaln(n + 1)
        trm1 -= n * tt.log(2) + nc * nc / 2. + (n / 2.) * tt.log(fac1) + gammaln(n / 2.)
        Px = tt.exp(trm1)
        valF = ncx2 / (2 * fac1)
        trm1 = tt.sqrt(2) * nc * value * hyp1f1(n / 2 + 1, 1.5, valF)
        trm1 /= np.asarray(fac1 * tt.gamma((n + 1) / 2))
        trm2 = hyp1f1((n + 1) / 2, 0.5, valF)
        trm2 /= np.asarray(np.sqrt(fac1) * tt.gamma(n / 2 + 1))
        Px *= trm1 + trm2
        return bound(Px, lam > 0, nu > 0, nc > 0)

    def logcdf(self, value):
        """
        Compute the log of the cumulative distribution function for Non-Central Student's T distribution
        at the specified value.
        Parameters
        ----------
        value: numeric
            Value(s) for which log CDF is calculated. If the log CDF for multiple
            values are desired the values must be provided in a numpy array or theano tensor.
        Returns
        -------
        TensorVariable
        """
        nu = self.nu
        nc = self.nc

        return nctdtr(nu, nc, value)

我的自定义模型：

with pm.Model() as model:
    # Prior Distributions for unknown model parameters:
    mu = pm.Normal('sigma', 0, 10)
    sigma = pm.Normal('sigma', 0, 10)
    nc= pm.HalfNormal('nc', sigma=10)
    nu= pm.HalfNormal('nu', sigma=1)

    # Observed data is from a Likelihood distributions (Likelihood (sampling distribution) of observations):
    => (input custom distribution) observed_data = pm.Beta('observed_data', alpha=alpha, beta=beta, observed=data)

    # draw 5000 posterior samples
    trace = pm.sample(draws=5000, tune=2000, chains=3, cores=1)

    # Obtaining Posterior Predictive Sampling:
    post_pred = pm.sample_posterior_predictive(trace, samples=3000)
    print(post_pred['observed_data'].shape)
    print('\nSummary: ')
    print(pm.stats.summary(data=trace))
    print(pm.stats.summary(data=post_pred))

编辑1：

我重新设计了自定义模型以包含自定义分布，但是，基于用于获取似然分布的方程或有时张量锁定并且代码只是冻结，我不断地得到错误。在下面找到我的代码，

with pm.Model() as model:
                # Prior Distributions for unknown model parameters:
                mu = pm.Normal('mu', mu=0, sigma=1)
                sd = pm.HalfNormal('sd', sigma=1)
                nc = pm.HalfNormal('nc', sigma=10)
                nu = pm.HalfNormal('nu', sigma=1)

                # Custom distribution:
                # observed_data = pm.DensityDist('observed_data', NonCentralStudentT, observed=data_list)

                # Observed data is from a Likelihood distributions (Likelihood (sampling distribution) of observations):
                observed_data = NonCentralStudentT('observed_data', mu=mu, sd=sd, nc=nc, nu=nu, observed=data_list)

                # draw 5000 posterior samples
                trace_S = pm.sample(draws=5000, tune=2000, chains=3, cores=1)

                # Obtaining Posterior Predictive Sampling:
                post_pred_S = pm.sample_posterior_predictive(trace_S, samples=3000)
                print(post_pred_S['observed_data'].shape)
                print('\nSummary: ')
                print(pm.stats.summary(data=trace_S))
                print(pm.stats.summary(data=post_pred_S))

编辑2：

为了将函数转换为theano，我正在网上查找，我发现定义函数的唯一内容是来自以下GitHub链接hyp1f1函数GitHub

这足以将函数转换为theano吗？
另外，我有个问题，用theano用NumPy数组可以吗？

另外，我想到了另一种方法，但我不确定这是否可以实现，我查看了 scipy 中的 nct 函数，他们写了以下内容，

如果 Y 是标准正态随机变量，V 是具有 k 个自由度的独立卡方随机变量 ( chi2 )，则

X=(Y+c) / sqrt(V/k)

在实线上具有非中心学生 t 分布。自由度参数 k（在实现中表示为 df）满足 k>0，非中心性参数 c（在实现中表示为 nc）为实数。

上面的概率密度以“标准化”形式定义。要移动和/或缩放分布，请使用 loc 和 scale 参数。具体来说， nct.pdf(x, df, nc, loc, scale) 等同于 nct.pdf(y, df, nc) / scale ，其中 y = (x - loc) / scale 。

所以，我想只使用先验作为它们分布中的正态和 chi2 随机变量代码部分，并将代码中前面提到的自由度变量用于 SciPy 中提到的方程，是否足以得到分布？

编辑3：

我设法运行关于拟合经验分布的链接中的代码，发现第二好的是学生 t 分布，所以，我将使用它。谢谢您的帮助。我只是有一个附带问题，我用学生 t 分布运行我的模型，但我收到了这些警告：

调整后有52个分歧。增加 target_accept 或重新参数化。接受概率与目标不匹配。它是 0.7037574708196309，但应该接近 0.8。尝试增加调整步骤的数量。某些参数的有效样本数小于 10%。

我只是对这些警告感到困惑，你知道这意味着什么吗？我知道这不会影响我的代码，但是，我可以减少分歧吗？关于有效样本，是否需要增加跟踪代码中的样本数量？

标签： pythonstatisticsdistributionpymc3pymc

python - 如何创建非中心学生的 T 分布以及与分布一起使用的先验？

问题描述

解决方案

推荐阅读