首页 > 解决方案 > 如何计算R中内核估计的Kullback-leiber散度

问题描述

我使用核估计来获得非参数概率密度函数。然后,我想使用 Kullback-leiber 散度比较两个连续变量的内核分布之间的尾部“距离”。我尝试了以下代码:

kl_l <- function(x,y) {
    integrand <- function(x,y) {

            f.x <- fitted(density(x, bw="nrd0"))
            f.y <- fitted(density(y, bw="nrd0"))

            return((log(f.x)-log(f.y))*f.x) 
    }
    return(integrate(integrand, lower=-Inf,upper=quantile(density(x,  bw="nrd0"),0.25))$value)
    #the Kullback-leiber equation
}

当我运行kl_l(a,b)a, b = 19 个连续变量时,它会返回警告

 Error in density(y, bw = "nrd0") : argument "y" is missing, with no default 

有没有办法计算这个?

(如果有人想查看实际方程式:https ://www.bankofengland.co.uk/-/media/boe/files/working-paper/2019/attention-to-the-tails-global-financial-conditions- and-exchange-rate-risks.pdf 第 13 页。)

标签: r

解决方案


In short, I think you just need to move the f.x and f.y outside the integrand (and possibly replace fitted with approxfun):

kl_l <- function(x, y) {
    f.x <- approxfun(density(x, bw = "nrd0"))
    f.y <- approxfun(density(y, bw = "nrd0"))
    integrand <- function(z) {
        return((log(f.x(z)) - log(f.y(z))) * f.x(z)) 
    }
    return(integrate(integrand, lower = -Inf, upper = quantile(density(x, bw="nrd0"), 0.25))$value)
    #the Kullback-leiber equation
}

Expanding a little:

Looking at the paper you referenced, it appears as though you need to first create the two fitted distributions f and g. So if your variable a contains observations under the 1-standard-deviation increase in global financial conditions, and b contains the observations under average global financial conditions, you can create two functions as in your example:

f <- approxfun(density(a))
g <- approxfun(density(b))

Then define the integrand:

integrand <- function(x) log(f(x) / g(x)) * f(x)

The upper bound:

upper <- quantile(density(b, bw = "nrd0"), 0.25)

And finally do the integration on x within the specified bounds. Note that each value of x in the numerical computation has to go into both f and g; in your function kl_l, the x and y were separately going into the integrand, which I think is incorrect; and in any case, integrate will only have operated on the first variable.

integrate(integrand, lower = -Inf, upper = upper)$value

One thing to check for is that approxfun returns NA for values outside the range specified in the density, which can mess up your operation, so you'll need to adjust for those (if you expect the density to go to zero, for example).


推荐阅读