首页 > 解决方案 > 如何创建一个函数来测试每个变量的正态性

问题描述

我正在尝试构建一个迭代返回 i)JarqueBera 测试统计,ii)JarqueBera pvalue,iii)probplot 的斜率,截距和确定系数的函数,以及 iv)probplot 本身。All 旨在一次返回单个变量。

def normality(c):
    JB_test_stat = ss.jarque_bera(c)[0]
    JB_pval = ss.jarque_bera(c)[1]
    probplot_slope = ss.probplot(c, plot = plt)[1][0]
    probplot_interc = ss.probplot(c, plot = plt)[1][1]
    probplot_r = ss.probplot(c, plot = plt)[1][2]
    return(print("Skewness:",c.skew(),"\nExcess kurtosis:",c.kurt(),"\nJarque-Bera stat:",JB_test_stat," pvalue:", JB_pval,"\nSlope:",probplot_slope,"Intercept:",probplot_interc, "r:",probplot_r,"\n"))

不幸的是,当我在我的数据框 [numeric_cols] 上调用该函数时,作为 numeric_cols 列表,

for c in numeric_cols:    
    normality(df[c])

我在 return 语句中正确地得到了所有的数字结果,但是在底部一个单一的概率图,所有变量都以凌乱的方式绘制,而我期望得到每个变量的数字结果及其对应的概率图。

Skewness: 0.1004187952160102 Excess kurtosis: -0.543819517693596 Jarque-Bera stat: 7.593972235734294 pvalue: 0.022438296430201454 Slope: 4.3135147782152465 Intercept: 25.5 r: 0.9947611456706487

Skewness: -0.1560130144763728 Excess kurtosis: -1.2824901951466612 Jarque-Bera stat: 38.56183464454786 pvalue: 4.23061985443951e-09 Slope: 11.492550446207257 Intercept: 19.535714285714285 r: 0.9668502992894236

Skewness: 0.2347601433103727 Excess kurtosis: -1.242639192300385 Jarque-Bera stat: 39.0662449724179 pvalue: 3.287552452491127e-09 Slope: 11.545683807955731 Intercept: 15.714285714285714 r: 0.9647448407831439

Skewness: 0.24353437856100904 Excess kurtosis: -1.1969521906230485 Jarque-Bera stat: 36.98912338336009 pvalue: 9.287822622106034e-09 Slope: 1013.985374629207 Intercept: 1411.4436090225563 r: 0.9682492605786011

偏度:2.837876986150242 过度峰度:9.5166283306540​​08 Jarque-Bera 统计:2675.4455000782764 p值:0.0 斜率:2.6057664781688454 截距:1.85338345864660578 508647605778

偏度:2.406153102778617 超额峰度:7.002529753885085 Jarque-Bera 统计:1573.6596724989513 pvalue:0.0 斜率:1.714847443415902 截距:1.287593984962490614181814 r:

偏度:0.9337529310147361 过度峰度:0.45862734243889847 Jarque-Bera 统计:81.22389376608798 pvalue:0.0 斜率:605.3354149443196 截距:717.75 r:0.9895040

偏度:-3.030640857636996 过度峰度:15.686541621050898 Jarque-Bera 统计:6154.761075129672 pvalue:0.0 斜率:11.37955609488042 截距:77.82387251904514056 r:

Skewness: 6.398317104228115 Excess kurtosis: 49.10097819497357 Jarque-Bera stat: 56029.69126113364 pvalue: 0.0 Slope: 0.41431397013222515 Intercept: 0.1917293233082707 r: 0.48503363895959983

Skewness: 6.204252341215679 Excess kurtosis: 47.28662289867727 Jarque-Bera stat: 52010.755388690835 pvalue: 0.0 Slope: 0.4947086253584861 Intercept: 0.23496240601503762 r: 0.5050004904368586

Skewness:2.06633193738682多余的Kurtosis:5.770784034742405 Jarque-Bera Stat:1098.0175308306793 PVALUE:0.0 SLOPE:0.0

Skewness:2.9189857433086495多余的Kurtosis:16.837230233306762 Jarque-Bera Stat:6909.724155123523 PVALUE:0.0 SLOPE:0.0

偏度:1.2633082232077495 过度峰度:1.5265390704578943 Jarque-Bera 统计:190.6495836394772 p值:0.0 斜率:2.09821120102269 截距:2.114661650735138282165473513821192

Skewness: 3.091346622737553 Excess kurtosis: 8.530683362863476 Jarque-Bera stat: 2421.371001114453 pvalue: 0.0 Slope: 0.16657862407594715 Intercept: 0.09022556390977444 r: 0.5658043763386988

在此处输入图像描述

怎么可能修好?谢谢大家

标签: pythonfunctionloopsscipy.stats

解决方案


只需plt.figure()在您的函数中添加一个,这样每次调用该函数都会打开一个新图形。
完全不同的是,使用return(print('stuff'))是多余的。如果您真的想打印结果,那么只需使用printwith no return
返回您当前正在打印的值,然后在外部打印它们会更pythonic并且通常更好的做法:

def normality(c):
    JB_test_stat = ss.jarque_bera(c)[0]
    JB_pval = ss.jarque_bera(c)[1]
    probplot_slope = ss.probplot(c, plot = plt)[1][0]
    probplot_interc = ss.probplot(c, plot = plt)[1][1]
    probplot_r = ss.probplot(c, plot = plt)[1][2]
    return c.skew(), c.kurt(), JB_test_stat, JB_pval, probplot_slope, probplot_interc, probplot_r


for c in numeric_cols:    
    c.skew(), c.kurt(), JB_test_stat, JB_pval, probplot_slope, probplot_interc, probplot_r = normality(df[c])
    print("Skewness:",c.skew(),
          "\nExcess kurtosis:",c.kurt(),
          "\nJarque-Bera stat:",JB_test_stat,
          " pvalue:", B_pval,
          "\nSlope:",probplot_slope,
          "Intercept:",probplot_interc, 
          "r:",probplot_r,"\n")

推荐阅读