首页 > 解决方案 > 从implemented_function 或rv_continuous 构造一个ContinuousRV

问题描述

我想构造一个ContinuousRV给定的 python 实现的概率密度函数 (pdf)。以下是一个最小的工作示例,其最后一条语句产生ValueError

import numpy as np
from scipy.stats import gaussian_kde, norm
from sympy import Interval, oo, symbols
from sympy.stats import ContinuousRV
from sympy.utilities.lambdify import implemented_function

# Example Data
measures = np.concatenate([norm.rvs(loc=-2, size=64), norm.rvs(loc=3, size=32)])

# Definition of the PDF
pdf_kde = gaussian_kde(measures)
pdf_sym = implemented_function("pdf", pdf_kde)

# Create the symbolic variable
XName, x = symbols('X x')
X = ContinuousRV(XName, pdf_sym(x), set=Interval(-oo, oo))

该示例失败并出现以下错误:

.../lib/python3.8/site-packages/sympy/stats/crv_types.py in check(pdf, set)
    149         x = Dummy('x')
    150         val = integrate(pdf(x), (x, set))
--> 151         _value_check(val == S.One, "The pdf on the given set is incorrect.")
    152 
    153 

.../lib/python3.8/site-packages/sympy/stats/rv.py in _value_check(condition, message)
   1450     truth = fuzzy_and(condition)
   1451     if truth == False:
-> 1452         raise ValueError(message)
   1453     return truth == True
   1454 

ValueError: The pdf on the given set is incorrect.

我已经确认 pdf 是一个很好的近似值。

from scipy import integrate
value, err = integrate.quad(pdf_kde, -np.inf, np.inf)
print(value, err)
>>> 0.9999999999999996 2.318795975521764e-09

如果相关的话,我目前正在使用 Python 3.8.0、 Sympy 1.6、 Scipy1.4.1和 Numpy 。1.18.5

标签: pythonpython-3.xscipysympy

解决方案


ContinuousRV方法构造了一个实例,ContinuousDistributionHandmade并在这样做时调用了一个检查方法,该方法失败了,因为 Sympy 不会自动执行数值计算。可以构建一个产生所需分布并以数字方式执行所述检查的包装器。

from sympy.stats.crv import SingleContinuousDistribution

def _check_dist(pdf, set):
    from sympy import Dummy, integrate, N
    x = Dummy('x')
    integrand = pdf(x)
    integral = integrate(integrand, (x, set.start, set.end))
    v = float(N(integral))
    assert np.isclose(float(v), 1.0)

def EmpiricalRV(name: str, m: np.ndarray) -> SingleContinuousDistribution:
    from scipy.stats import gaussian_kde
    from sympy import Interval, oo
    from sympy.stats import ContinuousDistributionHandmade
    from sympy.stats.crv import SingleContinuousPSpace
    from sympy.utilities.lambdify import implemented_function

    pdf_kde = gaussian_kde(m)
    pdf_sym = implemented_function(f"f_{name}", lambda y: pdf_kde(float(y)))
    domain  = Interval(-oo, oo)

    _check_dist(pdf_sym, domain)

    dist = ContinuousDistributionHandmade(pdf_sym, domain)
    pspace = SingleContinuousPSpace(name, dist)
    return pspace.value

以下测试代码演示了随机变量的作用。

from scipy.stats import norm
data = np.concatenate([norm.rvs(loc=-2, size=64), norm.rvs(loc=3, size=32)])

from sympy import N
from sympy.stats import density, Normal, E, P, median
Y = EmpiricalRV('Y',data)

ev = E(Y)
evfloat = float(N(ev))

print("Y          :", Y)
print("density(Y) :", density(Y))
print("E(Y)       :", ev)
print(f"N(E(Y))    : {evfloat:.4f}")
print(f"data.mean(): {data.mean():.4f}")

>>> Y          : Y
>>> density(Y) : ContinuousDistributionHandmade(f_Y, Interval(-oo, oo))
>>> E(Y)       : Integral(Y*f_Y(Y), (Y, -oo, oo))
>>> N(E(Y))    : -0.3882
>>> data.mean(): -0.3882

子类化ContinuousDistributionHandmadeSingleContinuousDistribution


推荐阅读