首页 > 解决方案 > 为什么多元线性回归中每个参数的p值大多小于0.05?

问题描述

import pandas as pd
import numpy as np
from sklearn import datasets, linear_model
from sklearn.linear_model import LinearRegression
import statsmodels.api as sm
from scipy import stats

df = pd.read_csv('new0110_1.csv', encoding= 'unicode_escape')

df=df.dropna(axis=0, how='any')
df_array = df.values
train_group = [2,3,4,5,6,7,8,9,10,11,12,13,14,15] #all Z
values = df_array[:,train_group] 
reframed = pd.DataFrame(values, columns = 
['Z','T1','T2','T3','T4','T5','T6','T7','T8','T9','T10','T11','T12','T13'])
X,Y = reframed[['T1','T2','T3','T4','T5','T6','T7','T8','T9','T10','T11','T12','T13']] , 
reframed[['Z']]


est = sm.OLS(Y, X)
est2 = est.fit()
print(est2.summary())}

在此处输入图像描述

df.corr()

在此处输入图像描述

因变量为Z。自变量为[T1、T2.....、T13]。

p值都小于0.05,是不是因为因变量和自变量有很强的相关性?

标签: pythonlinear-regressionp-value

解决方案


推荐阅读