首页 > 解决方案 > 使用范围参数比较两个数组(大小相同)中值的分布

问题描述

我想比较两个数组(大小相同)中值的分布,但是当我指定范围参数时直方图显示会发生变化:

def plot_compare(values1, values2, bins=100, range=None):
    fig = plt.figure(figsize=(8,6))
    ax = fig.add_subplot(111) 
    ax.hist(values1.ravel(), alpha=0.5, bins=bins, range=range, color= 'b', label='1')
    ax.hist(values2.ravel(), alpha=0.5, bins=bins, range=range, color= 'r', label='2')
    ax.legend(loc='upper right', prop={'size':14})
    plt.show()


plot_compare(a1, a2)

在此处输入图像描述

plot_compare(a1, a2, range=(-1200, 300))

在此处输入图像描述

如何进行正确的比较?我的目标是直观地了解两个数组中的值有何不同。

两个数组具有相同数量的值。

我应该为两个数组使用相同数量的箱(但箱的宽度不同),还是应该使用不同数量的箱(但宽度相同的箱)?

标签: pythonmatplotlibplothistogramdata-visualization

解决方案


You should use bins of the same width, if you want to compare two histograms. Therefore your second plot is correct.

The difference between two plots is that when range is specified, the width of bins is computed based on this range (i.e. your range is divided by the number of bins). With the first plot, the ranges of both arrays are different. Therefore the bins width is different.


推荐阅读