首页 > 技术文章 > stata固定效应

celine227 2021-06-19 16:48 原文

对于面板数据,我们有多种估计方法,包括混合OLS、固定效应(FE)、随机效应(RE)和最小二乘虚拟变量(LSDV)等等。不过,我们最为常用的估计方法那自然还是固定效应(组内估计),固定效应模型的Stata官方命令是xtreg,但它有时候其实并没有那么好用(如对数据格式有要求,运行速度慢等),我们经常使用的固定效应估计命令还有regaregreghdfe

xtreg

xtreg,fe是固定效应模型的官方命令,使用这一命令估计出来的系数是最为纯正的固定效应估计量(组内估计量)xtreg对数据格式有严格要求,要求必须是面板数据,在使用xtreg命令之前,我们首先需要使用xtset命令进行面板数据声明,定义截面(个体)维度和时间维度。一旦在xtreg命令后加上选项fe,那就表示使用固定效应组内估计方法进行估计,并且默认个体固定效应定义在xtset所设定的截面维度上。至于时间固定效应,需要引入虚拟变量i.year来表示不同的时间。

下面使用林毅夫老师(1992)的AER论文《Rural Reforms and Agricultural Growth in China》(中国的农村改革与农业增长)所使用的数据lin_1992.dta,给大家演示一下该命令的用法和估计结果。

. xtset province year
       panel variable:  province (strongly balanced)
        time variable:  year, 70 to 87
                delta:  1 unit
                
. xtreg ltvfo ltlan ltwlab ltpow ltfer hrs mci ngca i.year, fe vce(cluster province)

Fixed-effects (within) regression               Number of obs     =        476
Group variable: province                        Number of groups  =         28

R-sq:                                           Obs per group:
     within  = 0.8932                                         min =         17
     between = 0.6596                                         avg =       17.0
     overall = 0.7156                                         max =         17

                                                F(23,27)          =     949.82
corr(u_i, Xb)  = -0.3425                        Prob > F          =     0.0000

                              (Std. Err. adjusted for 28 clusters in province)
------------------------------------------------------------------------------
             |               Robust
       ltvfo |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       ltlan |   .5833594   .1745834     3.34   0.002     .2251439    .9415749
      ltwlab |   .1514909   .0585107     2.59   0.015     .0314368     .271545
       ltpow |   .0971114    .090911     1.07   0.295    -.0894225    .2836453
       ltfer |   .1693346   .0438098     3.87   0.001     .0794444    .2592248
         hrs |   .1503752   .0587581     2.56   0.016     .0298136    .2709368
         mci |   .1978373   .0810587     2.44   0.022     .0315186     .364156
        ngca |   .7784081   .4016301     1.94   0.063    -.0456688    1.602485
             |
        year |
         71  |  -.0240404    .023366    -1.03   0.313    -.0719836    .0239027
         72  |  -.1323624   .0404832    -3.27   0.003    -.2154272   -.0492977
         73  |  -.0377336   .0357883    -1.05   0.301     -.111165    .0356979
         74  |   .0058554   .0500774     0.12   0.908     -.096895    .1086058
         75  |   .0096731   .0566898     0.17   0.866    -.1066448    .1259911
         76  |  -.0476465    .061423    -0.78   0.445    -.1736761    .0783832
         77  |  -.0869336   .0680579    -1.28   0.212    -.2265767    .0527096
         78  |  -.0325205   .0766428    -0.42   0.675    -.1897785    .1247376
         79  |  -.0076332   .0833462    -0.09   0.928    -.1786454     .163379
         81  |   -.093479   .1093614    -0.85   0.400    -.3178701    .1309121
         82  |  -.0447862   .1207405    -0.37   0.714    -.2925251    .2029528
         83  |  -.0309435   .1377207    -0.22   0.824     -.313523    .2516361
         84  |   .0442535   .1428764     0.31   0.759    -.2489048    .3374117
         85  |  -.0033372   .1561209    -0.02   0.983    -.3236709    .3169965
         86  |     .00484    .157992     0.03   0.976    -.3193329    .3290129
         87  |   .0386475   .1639608     0.24   0.815    -.2977723    .3750674
             |
       _cons |   2.651286   .7738994     3.43   0.002     1.063376    4.239196
-------------+----------------------------------------------------------------
     sigma_u |  .29344594
     sigma_e |  .09930555
         rho |  .89724523   (fraction of variance due to u_i)
------------------------------------------------------------------------------

reg

通过在回归方程中引入虚拟变量来代表不同的个体,可以起到和固定效应组内估计方法(FE)同样的效果(已经被证明)。这种方法被称之为最小二乘虚拟变量方法(LSDV),一些教材和论文也把这种方法称之为固定效应估计方法。它的好处是可以得到对个体异质性[公式]的估计(FE是通过组内变换消去个体异质性[公式]),但如果个体[公式]很大,那么需要引入很多虚拟变量,自由度损失太多,还可能超出Stata所允许的解释变量个数。

LSDV方法的Stata命令是reg i.id i.year,其中,id是个体变量,year是时间变量,reg命令对数据格式没有要求,因而使用起来更为灵活,只是会生成一大长串虚拟变量估计结果。

. reg ltvfo ltlan ltwlab ltpow ltfer hrs mci ngca i.province i.year, vce(cluster province)

Linear regression                               Number of obs     =        476
                                                F(22, 27)         =          .
                                                Prob > F          =          .
                                                R-squared         =     0.9695
                                                Root MSE          =     .09931

                               (Std. Err. adjusted for 28 clusters in province)
-------------------------------------------------------------------------------
              |               Robust
        ltvfo |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
        ltlan |   .5833594   .1800436     3.24   0.003     .2139404    .9527783
       ltwlab |   .1514909   .0603407     2.51   0.018      .027682    .2752998
        ltpow |   .0971114   .0937543     1.04   0.309    -.0952565    .2894792
        ltfer |   .1693346   .0451799     3.75   0.001     .0766331    .2620362
          hrs |   .1503752   .0605958     2.48   0.020      .026043    .2747075
          mci |   .1978373   .0835939     2.37   0.025     .0263169    .3693578
         ngca |   .7784081   .4141914     1.88   0.071    -.0714423    1.628259
              |
     province |
     beijing  |  -.1865095   .1172887    -1.59   0.123     -.427166     .054147
      fujian  |   .0434646   .0473107     0.92   0.366    -.0536089    .1405381
       gansu  |  -.7945197   .1228202    -6.47   0.000    -1.046526   -.5425134
   guangdong  |  -.0278664   .0609608    -0.46   0.651    -.1529476    .0972149
     guangxi  |  -.2539549   .0614801    -4.13   0.000    -.3801015   -.1278082
     guizhou  |  -.2526439   .0598147    -4.22   0.000    -.3753736   -.1299142
       hebei  |   -.270106   .0948694    -2.85   0.008    -.4647619     -.07545
heilongjiang  |  -.0926732     .26542    -0.35   0.730      -.63727    .4519237
       henan  |  -.0920743   .0396983    -2.32   0.028    -.1735284   -.0106201
       hubei  |   .1024438   .0368811     2.78   0.010     .0267701    .1781176
       hunan  |  -.0434275   .0581142    -0.75   0.461    -.1626679    .0758129
     jiangsu  |   .1153335   .0352061     3.28   0.003     .0430965    .1875705
     jiangxi  |  -.1401737   .0596644    -2.35   0.026    -.2625949   -.0177525
       jilin  |  -.1783839   .2109985    -0.85   0.405    -.6113171    .2545493
    liaoning  |  -.2517315   .1563399    -1.61   0.119    -.5725145    .0690515
     neimong  |  -.8860432   .2325209    -3.81   0.001    -1.363137   -.4089498
     ningxia  |  -.8489859   .1732579    -4.90   0.000    -1.204482     -.49349
     qinghai  |  -.6982553   .1268849    -5.50   0.000    -.9586017   -.4379089
     shaanxi  |   -.320607   .0887091    -3.61   0.001     -.502623   -.1385911
   shangdong  |   .0040812   .0547494     0.07   0.941    -.1082554    .1164177
    shanghai  |   .0864336   .0982642     0.88   0.387    -.1151878     .288055
      shanxi  |  -.5005347   .1388718    -3.60   0.001     -.785476   -.2155934
     sichuan  |   .0335563   .0392453     0.86   0.400    -.0469685    .1140811
     tianjin  |     -.3011   .1049208    -2.87   0.008    -.5163796   -.0858203
    xinjiang  |  -.3740561   .2053926    -1.82   0.080    -.7954869    .0473746
      yunnan  |  -.2854833   .0590488    -4.83   0.000    -.4066415   -.1643251
    zhejiang  |   .1615248   .0760427     2.12   0.043     .0054981    .3175515
              |
         year |
          71  |  -.0240404   .0240968    -1.00   0.327     -.073483    .0254022
          72  |  -.1323624   .0417494    -3.17   0.004    -.2180251   -.0466998
          73  |  -.0377336   .0369076    -1.02   0.316    -.1134616    .0379945
          74  |   .0058554   .0516436     0.11   0.911    -.1001086    .1118193
          75  |   .0096731   .0584628     0.17   0.870    -.1102827     .129629
          76  |  -.0476465   .0633441    -0.75   0.458    -.1776178    .0823249
          77  |  -.0869336   .0701864    -1.24   0.226    -.2309442     .057077
          78  |  -.0325205   .0790398    -0.41   0.684    -.1946968    .1296559
          79  |  -.0076332   .0859529    -0.09   0.930    -.1839939    .1687275
          81  |   -.093479   .1127818    -0.83   0.414     -.324888    .1379301
          82  |  -.0447862   .1245167    -0.36   0.722    -.3002733     .210701
          83  |  -.0309435    .142028    -0.22   0.829    -.3223608    .2604739
          84  |   .0442535    .147345     0.30   0.766    -.2580735    .3465804
          85  |  -.0033372   .1610037    -0.02   0.984    -.3336895    .3270151
          86  |     .00484   .1629333     0.03   0.977    -.3294716    .3391516
          87  |   .0386475   .1690888     0.23   0.821    -.3082941    .3855891
              |
        _cons |   2.874582   .7510459     3.83   0.001     1.333563    4.415601
-------------------------------------------------------------------------------

areg

areg命令是对reg命令的改进和优化,其对数据结构也没有要求。有些时候我们想在回归中控制很多虚拟变量(i.id这种),但又不想生成虚拟变量,不想报告虚拟变量的回归结果,那么就可以使用areg命令,只需在选项absorb()的括号里加入你想要控制的类别变量就好。因此,我们也可以使用areg命令实现固定效应的估计,因为固定效应组内估计与LSDV效果是等价的。

不过absorb()的括号里只能加一个变量,如果想要估计双向固定效应或是更高维度固定效应,那么就还是要使用使用i.var的方式引入虚拟变量。

. areg ltvfo ltlan ltwlab ltpow ltfer hrs mci ngca i.year, absorb(province) vce(cluster province)

Linear regression, absorbing indicators         Number of obs     =        476
Absorbed variable: province                     No. of categories =         28
                                                F(  23,     27)   =     893.08
                                                Prob > F          =     0.0000
                                                R-squared         =     0.9695
                                                Adj R-squared     =     0.9659
                                                Root MSE          =     0.0993

                              (Std. Err. adjusted for 28 clusters in province)
------------------------------------------------------------------------------
             |               Robust
       ltvfo |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       ltlan |   .5833594   .1800436     3.24   0.003     .2139404    .9527783
      ltwlab |   .1514909   .0603407     2.51   0.018      .027682    .2752998
       ltpow |   .0971114   .0937543     1.04   0.309    -.0952565    .2894792
       ltfer |   .1693346   .0451799     3.75   0.001     .0766331    .2620362
         hrs |   .1503752   .0605958     2.48   0.020      .026043    .2747075
         mci |   .1978373   .0835939     2.37   0.025     .0263169    .3693578
        ngca |   .7784081   .4141914     1.88   0.071    -.0714423    1.628259
             |
        year |
         71  |  -.0240404   .0240968    -1.00   0.327     -.073483    .0254022
         72  |  -.1323624   .0417494    -3.17   0.004    -.2180251   -.0466998
         73  |  -.0377336   .0369076    -1.02   0.316    -.1134616    .0379945
         74  |   .0058554   .0516436     0.11   0.911    -.1001086    .1118193
         75  |   .0096731   .0584628     0.17   0.870    -.1102827     .129629
         76  |  -.0476465   .0633441    -0.75   0.458    -.1776178    .0823249
         77  |  -.0869336   .0701864    -1.24   0.226    -.2309442     .057077
         78  |  -.0325205   .0790398    -0.41   0.684    -.1946968    .1296559
         79  |  -.0076332   .0859529    -0.09   0.930    -.1839939    .1687275
         81  |   -.093479   .1127818    -0.83   0.414     -.324888    .1379301
         82  |  -.0447862   .1245167    -0.36   0.722    -.3002733     .210701
         83  |  -.0309435    .142028    -0.22   0.829    -.3223608    .2604739
         84  |   .0442535    .147345     0.30   0.766    -.2580735    .3465804
         85  |  -.0033372   .1610037    -0.02   0.984    -.3336895    .3270151
         86  |     .00484   .1629333     0.03   0.977    -.3294716    .3391516
         87  |   .0386475   .1690888     0.23   0.821    -.3082941    .3855891
             |
       _cons |   2.651286   .7981036     3.32   0.003     1.013713    4.288859
------------------------------------------------------------------------------

备注:如果出现matsize too small

set matsize 5000

reghdfe

reghdfe 主要用于实现多维固定效应线性回归。有些时候,我们需要控制多个维度(如城市-行业-年度)的固定效应,xtreg等命令也OK,但运行速度会很慢,reghdfe解决的就是这一痛点,其在运行速度方面远远优于xtreg等命令。reghdfe是一个外部命令,作者是Sergio Correia,有关这一命令的更多介绍详见github作者主页(),大家在使用之前需要安装(ssc install reghdfe)。

reghdfe命令可以包含多维固定效应,只需 absorb (var1,var2,var3,...),不需要使用i.var的方式引入虚拟变量,相比xtreg等命令方便许多,并且不会汇报一大长串虚拟变量回归结果。

. reghdfe ltvfo ltlan ltwlab ltpow ltfer hrs mci ngca, absorb(year province) vce(cluster province)
(MWFE estimator converged in 2 iterations)

HDFE Linear regression                            Number of obs   =        476
Absorbing 2 HDFE groups                           F(   7,     27) =     229.56
Statistics robust to heteroskedasticity           Prob > F        =     0.0000
                                                  R-squared       =     0.9695
                                                  Adj R-squared   =     0.9658
                                                  Within R-sq.    =     0.6751
Number of clusters (province) =         28        Root MSE        =     0.0994

                              (Std. Err. adjusted for 28 clusters in province)
------------------------------------------------------------------------------
             |               Robust
       ltvfo |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       ltlan |   .5833594   .1745834     3.34   0.002     .2251439    .9415749
      ltwlab |   .1514909   .0585107     2.59   0.015     .0314368     .271545
       ltpow |   .0971114    .090911     1.07   0.295    -.0894225    .2836453
       ltfer |   .1693346   .0438098     3.87   0.001     .0794444    .2592248
         hrs |   .1503752   .0587581     2.56   0.016     .0298136    .2709368
         mci |   .1978373   .0810587     2.44   0.022     .0315186     .364156
        ngca |   .7784081   .4016301     1.94   0.063    -.0456688    1.602485
       _cons |   2.625513   .7307092     3.59   0.001     1.126221    4.124804
------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
        year |        17           0          17     |
    province |        28          28           0    *|
-----------------------------------------------------+
* = FE nested within cluster; treated as redundant for DoF computation
eghdfe y x, absorb(id year industry) 可以实现控制多维固定效应

reghdfe y x, absorb(year#industry) 实现控制交乘固定效应

reghdfe也可以同时对标准误进行聚类

总结

从表格展示的回归结果可以发现,xtregregaregreghdfe四个命令估计的系数大小是一致的,只是标准误会有略微差异。其中,xtregreghdfe命令估计得到的标准误是一致的,它们背后的估计方法是固定效应,而regareg命令估计得到的标准误是一致的,因为这两个命令背后的估计方法是特殊的混合OLS(LSDV方法)。

推荐阅读