首页 > 解决方案 > 针对概率标准化的二维直方图

问题描述

我有一个二维数据集,我想绘制一个二维直方图,直方图上的每个单元格代表数据点的概率。因此,为了获得概率,我需要对直方图数据进行归一化,使其总和为 1。以下是我的示例,来自 2Dhistogram 文档:

xedges = [0,1,3,5]
yedges = [0,2,3,4,6]
#create edges of bins

#create random data points
x=np.random.normal(2,1,100)
y=np.random.normal(1,1,100)
H,xedges,yedges = np.histogram2d(x,y,bins=(xedges,yedges))
#setting normed=True in histogram2d doesn't seem to do what I need

H=H.T
#weirdly histogram2d swaps the x,y axis, so transpose to restore it.

fig = plt.figure(figsize=(7,3))
plt.imshow(H,interpolation='nearest',origin='low',extent=[xedges[0], xedges[-1],yedges[0],yedges[-1]])
plt.show()

结果图

首先,anp.sum(H)给出类似 86 的值。我希望每个单元格代表数据位于该分箱单元格上的概率,因此它们的总和应为 1。此外,您如何绘制将颜色强度映射到其值的图例imshow?

谢谢!

标签: pythonmatplotlibhistogram

解决方案


尝试使用normed参数。此外,根据文档, H 中的值将计算为bin_count / sample_count / bin_area。因此,我们计算箱的面积并将其乘以 H 以获得箱的概率。

xedges = [0,1,3,5]
yedges = [0,2,3,4,6]
# create edges of bins

x = np.random.normal(2, 1, 100) # create random data points
y = np.random.normal(1, 1, 100)
H, xedges, yedges = np.histogram2d(x, y, bins=(xedges, yedges), normed=True)
areas = np.matmul(np.array([np.diff(xedges)]).T, np.array([np.diff(yedges)]))
# setting normed=True in histogram2d doesn't seem to do what I need

fig = plt.figure(figsize=(7, 3))
im = plt.imshow(H*areas, interpolation='nearest', origin='low', extent=[xedges[0], xedges[-1], yedges[0], yedges[-1]])
plt.colorbar(im)
plt.show()

推荐阅读