首页 > 解决方案 > python中的双beta-statsmodel中带有虚拟变量的多元线性回归

问题描述

我正在尝试使用 statsmodel 回归计算 python 中的双 beta。不幸的是,我提示错误消息。

此处给出了双 beta 的回归方程

双贝塔公式

我暂时忽略了无风险利率(rf),但是一旦我添加它,实施应该是相似的。现在我的代码如下所示,其中我的“spx.xlsx”文件简单有两列带有返回值,称为“SPXrets”和“AAPLrets”(+ 一个带有日期的列):

import pandas as pd
from pandas import ExcelWriter
from pandas import ExcelFile

import statsmodels.api as sm
import statsmodels.formula.api as smf
import numpy as np


df = pd.read_excel('spx.xlsx')
print(df.columns)

mod = smf.ols(formula='AAPLrets ~ SPXrets', data=df)
res = mod.fit()
print(res.summary())

提示 patsy 错误:

PatsyError:拦截项不能与其他任何东西交互 AAPLrets ~ SPXrets:c(D) + SPXrets:(1-c(D))

感谢您的帮助 - 非常感谢!

标签: pythonregressionlinear-regressionstatsmodelsbeta

解决方案


编辑:

在我最初的建议之后,OP 更改了标题和提供的代码片段。我的建议已被相应地编辑。


新建议:

我怀疑您的数据集遇到了一些问题。我建议你告诉我们更多关于数据源、你如何加载数据、它看起来像什么(结构)以及你的列有什么类型(字符串、浮点等)。

我现在可以告诉您的是,您的代码段在一些示例数据中运行良好,如下所示:

代码:

               CONret  DAXret:c(D)  DAXret:(1-c(D))  AAPLrets  SPXrets  dummy
2017-01-08     109          107              122       101      100      0
2017-01-09     117          108              124       113      147      0
2017-01-10     142          108              130       107      103      1
2017-01-11     106          121              149       103      104      1
2017-01-12     124          149              143       112      126      0

输出:

                            OLS Regression Results                            
==============================================================================
Dep. Variable:               AAPLrets   R-squared:                       0.095
Model:                            OLS   Adj. R-squared:                  0.004
Method:                 Least Squares   F-statistic:                     1.044
Date:                Thu, 14 Feb 2019   Prob (F-statistic):              0.331
Time:                        16:00:01   Log-Likelihood:                -48.388
No. Observations:                  12   AIC:                             100.8
Df Residuals:                      10   BIC:                             101.7
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     84.3198     31.143      2.708      0.022      14.929     153.711
SPXrets        0.2635      0.258      1.022      0.331      -0.311       0.838
==============================================================================
Omnibus:                        5.649   Durbin-Watson:                   1.882
Prob(Omnibus):                  0.059   Jarque-Bera (JB):                2.933
Skew:                           1.202   Prob(JB):                        0.231
Kurtosis:                       3.290   Cond. No.                         872.
==============================================================================

这是整个事情:

# imports
import statsmodels.formula.api as smf
import pandas as pd
import numpy as np
import statsmodels.api as sm

# sample data
np.random.seed(1)
rows = 12
listVars= ['CONret','DAXret:c(D)', 'DAXret:(1-c(D))', 'AAPLrets', 'SPXrets']
rng = pd.date_range('1/1/2017', periods=rows, freq='D')
df = pd.DataFrame(np.random.randint(100,150,size=(rows, len(listVars))), columns=listVars) 
df = df.set_index(rng)
df['dummy'] = np.random.randint(2, size=df.shape[0])

mod = smf.ols(formula='AAPLrets ~ SPXrets', data=df)
res = mod.fit()
res.summary()

另一个建议:


就个人而言,如果没有帕西,我会感觉更舒服。

下面的代码片段将让您运行线性回归并选择是返回模型摘要,还是返回具有其他详细信息(如系数 p 值和 r 平方)的数据框。

# Imports
import pandas as pd
import numpy as np
import statsmodels.api as sm

# sample data
np.random.seed(1)
rows = 12
listVars= ['CONret','DAXret:c(D)', 'DAXret:(1-c(D))', 'AAPLrets', 'SPXrets']
rng = pd.date_range('1/1/2017', periods=rows, freq='D')
df = pd.DataFrame(np.random.randint(100,150,size=(rows, len(listVars))), columns=listVars) 
df = df.set_index(rng)
df['dummy'] = np.random.randint(2, size=df.shape[0])

def LinReg(df, y, x, const, results):

    betas = x.copy()

    # Model with out without a constant
    if const == True:
        x = sm.add_constant(df[x])
        model = sm.OLS(df[y], x).fit()
    else:
        model = sm.OLS(df[y], df[x]).fit()

    # Estimates of R2 and p
    res1 = {'Y': [y],
            'R2': [format(model.rsquared, '.4f')],
            'p': [model.pvalues.tolist()],
            'start': [df.index[0]], 
            'stop': [df.index[-1]],
            'obs' : [df.shape[0]],
            'X': [betas]}
    df_res1 = pd.DataFrame(data = res1)

    # Regression Coefficients
    theParams = model.params[0:]
    coefs = theParams.to_frame()
    df_coefs = pd.DataFrame(coefs.T)
    xNames = list(df_coefs)
    xValues = list(df_coefs.loc[0].values)
    xValues2 = [ '%.2f' % elem for elem in xValues ]
    res2 = {'Independent': [xNames],
            'beta': [xValues2]}
    df_res2 = pd.DataFrame(data = res2)

    # All results
    df_res = pd.concat([df_res1, df_res2], axis = 1)
    df_res = df_res.T
    df_res.columns = ['results']


    if results == 'summary':

        return(model.summary())
        print(model.summary())
    else:
        return(df_res)

df_regression = LinReg(df = df, y = 'CONret', x = ['DAXret:c(D)', 'DAXret:(1-c(D))', 'dummy'], const = True, results = 'summary')

print(df_regression)

试运行1:

df_regression = LinReg(df = df, y = 'CONret', x = ['DAXret:c(D)', 'DAXret:(1-c(D))'], const = True, results = '')

输出 1:

                                                       results
Y                                                       CONret
R2                                                      0.0813
p            [0.13194822614949883, 0.45726622261432304, 0.9...
start                                      2017-01-01 00:00:00
stop                                       2017-01-12 00:00:00
obs                                                         12
X                        [DAXret:c(D), DAXret:(1-c(D)), dummy]
Independent       [const, DAXret:c(D), DAXret:(1-c(D)), dummy]
beta                                [88.94, 0.24, -0.01, 2.20]

试运行2:

df_regression = LinReg(df = df, y = 'CONret', x = ['DAXret:c(D)', 'DAXret:(1-c(D))', 'dummy'], const = True, results = 'summary')

输出 2:

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 CONret   R-squared:                       0.081
Model:                            OLS   Adj. R-squared:                 -0.263
Method:                 Least Squares   F-statistic:                    0.2361
Date:                Thu, 14 Feb 2019   Prob (F-statistic):              0.869
Time:                        16:04:02   Log-Likelihood:                -47.138
No. Observations:                  12   AIC:                             102.3
Df Residuals:                       8   BIC:                             104.2
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
===================================================================================
                      coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------------------------------------------------------
const              88.9438     53.019      1.678      0.132     -33.318     211.205
DAXret:c(D)         0.2350      0.301      0.781      0.457      -0.459       0.929
DAXret:(1-c(D))    -0.0060      0.391     -0.015      0.988      -0.908       0.896
dummy               2.2005      8.973      0.245      0.812     -18.490      22.891
==============================================================================
Omnibus:                        1.025   Durbin-Watson:                   2.354
Prob(Omnibus):                  0.599   Jarque-Bera (JB):                0.720
Skew:                           0.540   Prob(JB):                        0.698
Kurtosis:                       2.477   Cond. No.                     2.15e+03
==============================================================================

推荐阅读