首页 > 解决方案 > 使用 KDE 的边缘效应密度 2D 图

问题描述

我正在绘制一个使用scipy.stats.gaussian_kde获得的简单二维密度图。在密度看起来较低的边缘总是有一个绘图伪影:

在此处输入图像描述

我已经尝试了imshow()中的所有插值方法,但似乎没有一个能够摆脱它。有没有合适的方法来处理这个?

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

x_data = np.random.uniform(1., 2000., 1000)
y_data = np.random.uniform(1., 2000., 1000)
xmin, xmax = np.min(x_data), np.max(x_data)
ymin, ymax = np.min(y_data), np.max(y_data)
values = np.vstack([x_data, y_data])

# Gaussian KDE.
kernel = stats.gaussian_kde(values, bw_method=.2)
# Grid density (number of points).
gd_c = complex(0, 50)
# Define x,y grid.
x_grid, y_grid = np.mgrid[xmin:xmax:gd_c, ymin:ymax:gd_c]
positions = np.vstack([x_grid.ravel(), y_grid.ravel()])
# Evaluate kernel in grid positions.
k_pos = kernel(positions)

ext_range = [xmin, xmax, ymin, ymax]
kde = np.reshape(k_pos.T, x_grid.shape)
im = plt.imshow(np.rot90(kde), cmap=plt.get_cmap('RdYlBu_r'), extent=ext_range)

plt.show()

标签: pythonmatplotlibkernel-density

解决方案


过了一会儿,我找到了解决这个问题的方法,应用了 Flabetvibes 在这个出色的答案中解释的巧妙技巧。

我使用那里显示的代码来镜像数据,如上述答案的第一张图所示。我介绍的唯一修改是将镜像数据修剪为perc填充(我默认将其设置为 10%),以免携带很多不必要的值。

结果如下所示,左侧为原始非镜像数据,右侧为镜像数据:

在此处输入图像描述

可以看出,所得密度图的变化并非微不足道。我个人认为镜像数据 KDE 更能代表实际密度。

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

def in_box(towers, bounding_box):
    return np.logical_and(np.logical_and(bounding_box[0] <= towers[:, 0],
                                         towers[:, 0] <= bounding_box[1]),
                          np.logical_and(bounding_box[2] <= towers[:, 1],
                                         towers[:, 1] <= bounding_box[3]))


def dataMirror(towers, bounding_box, perc=.1):
    # Select towers inside the bounding box
    i = in_box(towers, bounding_box)
    # Mirror points
    points_center = towers[i, :]
    points_left = np.copy(points_center)
    points_left[:, 0] = bounding_box[0] - (points_left[:, 0] - bounding_box[0])
    points_right = np.copy(points_center)
    points_right[:, 0] = bounding_box[1] + (bounding_box[1] - points_right[:, 0])
    points_down = np.copy(points_center)
    points_down[:, 1] = bounding_box[2] - (points_down[:, 1] - bounding_box[2])
    points_up = np.copy(points_center)
    points_up[:, 1] = bounding_box[3] + (bounding_box[3] - points_up[:, 1])
    points = np.append(points_center,
                       np.append(np.append(points_left,
                                           points_right,
                                           axis=0),
                                 np.append(points_down,
                                           points_up,
                                           axis=0),
                                 axis=0),
                       axis=0)

    # Trim mirrored frame to withtin a 'perc' pad
    xr, yr = np.ptp(towers.T[0]) * perc, np.ptp(towers.T[1]) * perc
    xmin, xmax = bounding_box[0] - xr, bounding_box[1] + xr
    ymin, ymax = bounding_box[2] - yr, bounding_box[3] + yr
    msk = (points[:, 0] > xmin) & (points[:, 0] < xmax) &\
        (points[:, 1] > ymin) & (points[:, 1] < ymax)
    points = points[msk]

    return points.T


def KDEplot(xmin, xmax, ymin, ymax, values):
    # Gaussian KDE.
    kernel = stats.gaussian_kde(values, bw_method=.2)
    # Grid density (number of points).
    gd_c = complex(0, 50)
    # Define x,y grid.
    x_grid, y_grid = np.mgrid[xmin:xmax:gd_c, ymin:ymax:gd_c]
    positions = np.vstack([x_grid.ravel(), y_grid.ravel()])
    # Evaluate kernel in grid positions.
    k_pos = kernel(positions)

    ext_range = [xmin, xmax, ymin, ymax]
    kde = np.reshape(k_pos.T, x_grid.shape)

    plt.imshow(np.rot90(kde), cmap=plt.get_cmap('RdYlBu_r'), extent=ext_range)


x_data = np.random.uniform(1., 2000., 1000)
y_data = np.random.uniform(1., 2000., 1000)

xmin, xmax = np.min(x_data), np.max(x_data)
ymin, ymax = np.min(y_data), np.max(y_data)
values = np.vstack([x_data, y_data])

# Plot non-mirrored data
plt.subplot(121)
KDEplot(xmin, xmax, ymin, ymax, values)
plt.scatter(*values, s=3, c='k')
plt.xlim(xmin, xmax)
plt.ylim(ymin, ymax)

# Plot mirrored data
bounding_box = (xmin, xmax, ymin, ymax)
values = dataMirror(values.T, bounding_box)
plt.subplot(122)
KDEplot(xmin, xmax, ymin, ymax, values)
plt.scatter(*values, s=3, c='k')
plt.xlim(xmin, xmax)
plt.ylim(ymin, ymax)

plt.show()

推荐阅读