首页 > 解决方案 > `scipy.stats.expon.rvs()` 和 `numpy.random.exponential()` 有什么区别?





如果您假设无击中被描述为泊松过程,那么无击中之间的时间是指数分布的。如您所见,指数分布有一个参数,我们将其称为 $τ$,即典型的间隔时间

使指数分布与数据最匹配的参数 $τ$ 的值是无击球手之间的平均间隔时间(其中时间以游戏数为单位)。

# Here you go with the data
nohitter_times = np.array([ 843, 1613, 1101,  215,  684,  814,  278,  324,  161,  219,  545,
        715,  966,  624,   29,  450,  107,   20,   91, 1325,  124, 1468,
        104, 1309,  429,   62, 1878, 1104,  123,  251,   93,  188,  983,
        166,   96,  702,   23,  524,   26,  299,   59,   39,   12,    2,
        308, 1114,  813,  887,  645, 2088,   42, 2090,   11,  886, 1665,
       1084, 2900, 2432,  750, 4021, 1070, 1765, 1322,   26,  548, 1525,
         77, 2181, 2752,  127, 2147,  211,   41, 1575,  151,  479,  697,
        557, 2267,  542,  392,   73,  603,  233,  255,  528,  397, 1529,
       1023, 1194,  462,  583,   37,  943,  996,  480, 1497,  717,  224,
        219, 1531,  498,   44,  288,  267,  600,   52,  269, 1086,  386,
        176, 2199,  216,   54,  675, 1243,  463,  650,  171,  327,  110,
        774,  509,    8,  197,  136,   12, 1124,   64,  380,  811,  232,
        192,  731,  715,  226,  605,  539, 1491,  323,  240,  179,  702,
        156,   82, 1397,  354,  778,  603, 1001,  385,  986,  203,  149,
        576,  445,  180, 1403,  252,  675, 1351, 2983, 1568,   45,  899,
       3260, 1025,   31,  100, 2055, 4043,   79,  238, 3931, 2351,  595,
        110,  215,    0,  563,  206,  660,  242,  577,  179,  157,  192,
        192, 1848,  792, 1693,   55,  388,  225, 1134, 1172, 1555,   31,
       1582, 1044,  378, 1687, 2915,  280,  765, 2819,  511, 1521,  745,
       2491,  580, 2072, 6450,  578,  745, 1075, 1103, 1549, 1520,  138,
       1202,  296,  277,  351,  391,  950,  459,   62, 1056, 1128,  139,
        420,   87,   71,  814,  603, 1349,  162, 1027,  783,  326,  101,
        876,  381,  905,  156,  419,  239,  119,  129,  467])


import scipy.stats as stats

# computing the distribution parameter
avg_interval = np.mean(nohitter_times)

# Set the seed
# Simulating the distribution
rvs = stats.expon.rvs(avg_interval, size=100000)

#Plotting the distribution
#sns.histplot(rvs, kde=True, bins=100, color='skyblue', stat='density');
_ = plt.hist(rvs, bins=50, density=True, histtype="step")
_ = plt.xlabel('Games between no-hitters')
_ = plt.ylabel('PDF');



# Seed random number generator

# Compute mean no-hitter time: tau
tau = np.mean(nohitter_times)

# Draw out of an exponential distribution with parameter tau: inter_nohitter_time
inter_nohitter_time = np.random.exponential(tau, 100000)

# Plot the PDF and label axes
_ = plt.hist(inter_nohitter_time, bins=50, density=True, histtype="step")
_ = plt.xlabel('Games between no-hitters')
_ = plt.ylabel('PDF')


如您所见,这两个图在 x 轴刻度方面完全不同。我不知道为什么?

标签: numpyrandomscipystatisticsexponential



# computing the distribution parameter
avg_interval = np.mean(nohitter_times)

# Set the seed
# Simulating the distribution
rvs = stats.expon.rvs(scale=avg_interval, size=100000)

#Plotting the distribution
#sns.histplot(rvs, kde=True, bins=100, color='skyblue', stat='density');
_ = plt.hist(rvs, bins=50, density=True, histtype="step")
_ = plt.xlabel('Games between no-hitters')
_ = plt.ylabel('PDF');

