首页 > 解决方案 > 使用 statsmodel (pandas/matplotlib) 在散点图上绘制 p 值

问题描述

我需要帮助将 p 值添加到我的数字中,但我遇到了三个问题。1)每当我statsmodel用来计算p-values时,我都会得到两个p-values,一个用于“截距”,一个用于 y 变量(这是我要绘制的那个)。2)我正在使用循环一次创建多个图形。3)我不知道如何隔离p-value我想要绘制的具体内容,因为当我打印 p 值时,它会同时显示p-values我正在准备的每个图形。这是我的代码,以防您想了解我对两者的含义p-values

###(this is sample data in case you are trying to recreate the code)
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.formula.api as smf
import statsmodels.api as sm

dpm=pd.DataFrame({'pm10_3135_2018':[30,34,32,44,45,46,59,54,59,30],
'nox_3135(ppb)':[20,29,27,31,33,33,34,23,32,31],
'CO_3135(ppm)':[0.8,0.9,0.1,0.2,0.5,0.5,0.7,0.8,0.9,0.3],
'O3_mda8_3135':[42,45,47,51,52,52,57,67,69,70],
'pm25_3135_2018':[6,7,6,7,4,5,2,11,9,18]})

##PM2.5 vs variables - whole year

dpm = dpm.reset_index()

x = [dpm.pm10_3135_2018,dpm['nox_3135(ppb)'],dpm['CO_3135(ppm)'],dpm.O3_mda8_3135]
y = dpm.pm25_3135_2018
xlab = ["PM10 (ug/m^3)", "NOx (ppb)", "CO (ppm)", "O3 MDA8 (ppb)"]
fnames = ['NOMR2_PM10vsPM25_yr_2018.png','NOMR2_NOxvsPM25_yr_2018.png','NOMR2_COvsPM25_yr_2018.png','NOMR2_O3vsPM25_yr_2018.png']

for xcol,lab,fname in zip(x,xlab,fnames):

    correlation_matrix1 = np.corrcoef(xcol, y)
    correlation_xy1 = correlation_matrix1[0,1]
    R2_1 = correlation_xy1**2
    m, b = np.polyfit(xcol,y,1)
    equation = 'y = ' + str(round(m,4)) + 'x' ' + ' + str(round(b,4))
    R2 = '$R^2$ =' + str(round(R2_1,3))
    fig, ax = plt.subplots()
    ax.plot(xcol, y, color='xkcd:red',linestyle='None',marker='o')
    ax.set_xlabel(lab,fontsize=15)
    ax.set_ylabel('PM2.5 (ug/m^3)',fontsize=15)
    ax.set_ylim(0,)
    ax.set_xlim(0,)
    plt.text(0.75, 0.65, equation, horizontalalignment='center',
             verticalalignment='center',
             transform=ax.transAxes)
    plt.text(0.7, 0.6, R2, horizontalalignment='center',
         verticalalignment='center',
         transform=ax.transAxes)
    model = smf.ols('xcol ~ y', data=dpm).fit()
    print(model.summary())
    print(model.pvalues)

对于代码的下一部分,我有这个,但我需要一种方法来从函数中调用y变量并创建一个新变量来表示那些,然后在图上绘制,但我不知道该怎么做. (免责声明,这不是我的实际数据,因此数据点之间没有太大的相关性,但过程是相同的)。p-valuesstatsmodelPp-valuesP

plt.text(0.7, 0.55, P, horizontalalignment='center',
     verticalalignment='center',
     transform=ax.transAxes)

fig.tight_layout()
#plt.savefig(fname)

标签: pythonpandasmatplotlibstatsmodelsp-value

解决方案


model.pvalues是一个熊猫系列(即检查type(model.pvalues)所以如果你想提取 p 值y,那么你很简单

model.pvalues['y']

要将 p 值添加到绘图中,您可以添加:

print(model.pvalues)
plt.text(0.7, 0.8, "y p-values: %.2f" %(model.pvalues['y']), horizontalalignment='center',
     verticalalignment='center',
     transform=ax.transAxes)

我在其中添加了一些文本格式"y p-value..",以使您在绘图上绘制的内容更加清晰。

这是完整的循环:

for xcol,lab,fname in zip(x,xlab,fnames):

    correlation_matrix1 = np.corrcoef(xcol, y)
    correlation_xy1 = correlation_matrix1[0,1]
    R2_1 = correlation_xy1**2
    m, b = np.polyfit(xcol,y,1)
    equation = 'y = ' + str(round(m,4)) + 'x' ' + ' + str(round(b,4))
    R2 = '$R^2$ =' + str(round(R2_1,3))
    fig, ax = plt.subplots()
    ax.plot(xcol, y, color='xkcd:red',linestyle='None',marker='o')
    ax.set_xlabel(lab,fontsize=15)
    ax.set_ylabel('PM2.5 (ug/m^3)',fontsize=15)
    ax.set_ylim(0,)
    ax.set_xlim(0,)
    plt.text(0.75, 0.65, equation, horizontalalignment='center',
             verticalalignment='center',
             transform=ax.transAxes)
    plt.text(0.7, 0.6, R2, horizontalalignment='center',
         verticalalignment='center',
         transform=ax.transAxes)
    model = smf.ols('xcol ~ y', data=dpm).fit()
    print(model.summary())
    print(model.pvalues)

    #added code:
    plt.text(0.7, 0.8, "y p-values: %.2f" %(model.pvalues['y']), horizontalalignment='center',
         verticalalignment='center',
         transform=ax.transAxes)

此外,如果我适当地解释您的代码、注释和标准统计数据,那么您的公式应该是

model = smf.ols('y ~ xcol', data=dpm).fit()

在这种情况下,您想要提取 x 变量的 p 值,因此您修改上面的代码model.pvalues[xcol]


推荐阅读