首页 > 解决方案 > 如何在 survreg 区间回归中模拟乘法异方差?

问题描述

我想比较 R 中的两个区间回归模型,使用survival::survival. 第二个模型由相同的基本模型组成,但也应该对条件方差进行建模。我已经知道如何在 Stata 中计算这个,但我想知道如何在 R 中做到这一点。

在 R 中,为了获得第一个模型,我在下限和上限之外创建了一个生存对象,y.min并将y.max其用作因变量。

library(survival)
intdat$y <- with(intdat, Surv(y.min, y.max, type="interval2"))
summary(sfit <- survreg(y ~ x1 + x2, data=intdat, dist="gaussian"))
# Call:
#   survreg(formula = y ~ x1 + x2, data = intdat, dist = "gaussian")
#                Value Std. Error     z      p
# (Intercept) 230.4794    11.2796 20.43 <2e-16
# x11          17.0302     9.4905  1.79  0.073
# x2           -4.9073     2.8457 -1.72  0.085
# Log(scale)    3.5762     0.0728 49.14 <2e-16
# 
# Scale= 35.7 
# 
# Gaussian distribution
# Loglik(model)= -470.5   Loglik(intercept only)= -473.1
# Chisq= 5.2 on 2 degrees of freedom, p= 0.074 
# Number of Newton-Raphson Iterations: 3 
# n= 100

在 Stata 中,我得到完全相同的结果intreg

. use intdat, clear

. intreg y_min y_max i.x1 x2

现在我想将条件方差指定为乘性异方差并添加自变量来对方差建模。在 Stata 中,我可以使用该het()选项。

. intreg y_min y_max i.x1 x2, het(i.cl y_min)

[...]

Interval regression                             Number of obs     =        100
                                                Wald chi2(2)      =       1.30
Log likelihood =  -436.8589                     Prob > chi2       =     0.5222

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
model        |
          x1 |
          1  |   5.186822    5.18936     1.00   0.318    -4.984136    15.35778
          x2 |  -1.259959   1.803787    -0.70   0.485    -4.795317    2.275399
       _cons |   237.1218   7.644733    31.02   0.000     222.1384    252.1052
-------------+----------------------------------------------------------------
lnsigma      |
          cl |
          B  |  -.2222544   .5215087    -0.43   0.670    -1.244393    .7998839
          C  |  -1.601658    .447561    -3.58   0.000    -2.478861   -.7244541
          D  |  -1.063609   .4233763    -2.51   0.012    -1.893411   -.2338066
          E  |  -.8029018   .4324614    -1.86   0.063     -1.65051    .0447069
          F  |  -1.698248   .4749063    -3.58   0.000    -2.629047    -.767449
          G  |  -.9595613   .4689475    -2.05   0.041    -1.878681   -.0404412
             |
       y_min |   -.002804   .0023935    -1.17   0.241    -.0074952    .0018873
       _cons |   5.096241   .4189039    12.17   0.000     4.275205    5.917278
------------------------------------------------------------------------------
             0  left-censored observations
            94     uncensored observations
             0 right-censored observations
             6       interval observations

我现在想在 R 中复制这个结果。但是我还没有找到一种方法来添加自变量来模拟方差。

一个r-help 帖子建议包含+ strata()类似的问题,我尝试了:

survreg(y ~ x1 + x2 + strata(cl) + strata(y.min), data=intdat, dist="gaussian")

但是,结果与预期的结果完全不同。进一步的搜索结果也将我引向了 GARCH 和其他不同的方法,但我找不到解决这个看似简单的问题的方法。有谁知道如何为 R 解决这个问题?


数据:

intdat <- structure(list(x1 = structure(c(1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 
1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 
1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 
1L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 
1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L), .Label = c("0", 
"1"), class = "factor"), x2 = c(8.45510499910282, 5.91889385427315, 
4.17438726989564, 2.89037175789616, 3.95124371858143, 5.32300997913841, 
3.58351893845611, 4.02535169073515, 4.48863636973214, 5.20948615284142, 
2.30258509299405, 5.74300318780948, 1.94591014905531, 4.969813299576, 
4.24849524204936, 2.83321334405622, 3.93182563272433, 3.36729582998647, 
2.99573227355399, 2.99573227355399, 3.66356164612965, 3.76120011569356, 
3.80666248977032, 3.87120101090789, 5.08759633523238, 0.693147180559945, 
5.08140436498446, 2.07944154167984, 5.2040066870768, 4.39444915467244, 
4.51085950651685, 1.79175946922805, 3.73766961828337, 4.56434819146784, 
0.693147180559945, 4.20469261939097, 2.30258509299405, 4.31748811353631, 
3.09104245335832, 2.07944154167984, 4.35670882668959, 5.12989871492307, 
3.58351893845611, 4.20469261939097, 4.81218435537242, 4.15888308335967, 
3.17805383034795, 3.58351893845611, 4.57471097850338, 3.2188758248682, 
1.38629436111989, 4.44265125649032, 5.91889385427315, 3.98898404656427, 
6.34388043412633, 2.99573227355399, 3.61091791264422, 0.693147180559945, 
5.92425579741453, 4.23410650459726, 3.09104245335832, 3.09104245335832, 
1.09861228866811, 3.78418963391826, 6.83840520084734, 3.85014760171006, 
2.07944154167984, 5.2040066870768, 2.89037175789616, 5.30330490805908, 
8.45489216521886, 2.39789527279837, 3.61091791264422, 2.63905732961526, 
2.83321334405622, 4.15888308335967, 1.94591014905531, 1.79175946922805, 
3.2188758248682, 5.11198778835654, 4.53259949315326, 4.56434819146784, 
4.11087386417331, 5.50125821054473, 4.06044301054642, 4.33073334028633, 
3.71357206670431, 2.39789527279837, 2.39789527279837, 1.09861228866811, 
5.61677109766657, 4.76217393479776, 4.59511985013459, 4.48863636973214, 
3.29583686600433, 2.77258872223978, 5.97126183979046, 3.17805383034795, 
4.04305126783455, 3.3322045101752), cl = structure(c(6L, 2L, 
4L, 3L, 6L, 3L, 3L, 3L, 3L, 3L, 3L, 6L, 6L, 3L, 3L, 3L, 5L, 5L, 
1L, 4L, 3L, 3L, 3L, 3L, 7L, 3L, 1L, 3L, 3L, 3L, 5L, 7L, 5L, 5L, 
5L, 7L, 6L, 1L, 3L, 5L, 6L, 3L, 6L, 5L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 2L, 2L, 6L, 3L, 1L, 3L, 1L, 3L, 3L, 4L, 3L, 4L, 7L, 3L, 
4L, 3L, 6L, 4L, 6L, 6L, 7L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 5L, 5L, 
6L, 3L, 4L, 3L, 3L, 6L, 7L, 3L, 4L, 3L, 3L, 3L, 7L, 3L, 3L, 5L, 
4L, 3L), .Label = c("A", "B", "C", "D", "E", "F", "G"), class = "factor"), 
    y.min = c(0, 131, 234.33, 230.9, 252, 245.09, 224.11, 235.52, 
    236, 234.04, 237, 247, 234.5, 251.02, 223, 230.8, 200, 181.01, 
    120, 204, 232.5, 222, 225, 200.04, 206.99, 266, 101, 220.87, 
    225.5, 206, 200.81, 191, 200.99, 200, 160.98, 200.5, 230, 
    120, 0, 193, 246, 235, 248, 221, 231, 236.55, 223, 240.5, 
    226, 246, 231.99, 251.01, 146, 151.5, 231.02, 250, 124, 0, 
    96, 255, 251, 221, 241, 236, 201.51, 235.5, 239, 230, 263, 
    191, 0, 0, 222, 232.77, 219.55, 207.77, 215, 251, 191.88, 
    237, 207.67, 180, 218, 233, 201, 272, 261, 0, 171.53, 200.99, 
    153.99, 243.23, 233.99, 231, 201, 281, 221.5, 178, 201, 231
    ), y.max = c(279, 131, 234.33, 230.9, 252, 245.09, 224.11, 
    235.52, 236, 234.04, 237, 247, 234.5, 251.02, 223, 230.8, 
    200, 181.01, 120, 204, 232.5, 222, 225, 200.04, 206.99, 266, 
    101, 220.87, 225.5, 206, 200.81, 191, 200.99, 200, 160.98, 
    200.5, 230, 120, 349, 193, 246, 235, 248, 221, 231, 236.55, 
    223, 240.5, 226, 246, 231.99, 251.01, 146, 151.5, 231.02, 
    250, 124, 210, 96, 255, 251, 221, 241, 236, 201.51, 235.5, 
    239, 230, 263, 191, 279, 310, 222, 232.77, 219.55, 207.77, 
    215, 251, 191.88, 237, 207.67, 180, 218, 233, 201, 272, 261, 
    310, 171.53, 200.99, 153.99, 243.23, 233.99, 231, 201, 281, 
    221.5, 178, 201, 231)), row.names = c(4673L, 1092L, 2140L, 
3803L, 3096L, 257L, 3936L, 467L, 5061L, 2913L, 1104L, 4386L, 
4579L, 1142L, 2907L, 5082L, 5396L, 3155L, 176L, 1810L, 3447L, 
2256L, 3980L, 2496L, 72L, 505L, 3397L, 531L, 3845L, 3858L, 1138L, 
3191L, 293L, 2338L, 2903L, 5468L, 2534L, 5256L, 4640L, 1438L, 
4166L, 4300L, 2935L, 5368L, 4488L, 4342L, 2150L, 4513L, 2904L, 
459L, 3307L, 5357L, 1090L, 3625L, 554L, 871L, 4545L, 2937L, 3513L, 
4517L, 4008L, 2912L, 484L, 735L, 4874L, 4177L, 965L, 1765L, 1487L, 
1898L, 4674L, 941L, 4424L, 4968L, 2755L, 4722L, 4260L, 1013L, 
762L, 3399L, 2835L, 3704L, 4936L, 2014L, 2126L, 4041L, 1556L, 
945L, 1963L, 3366L, 2278L, 2112L, 3844L, 5476L, 782L, 4742L, 
2440L, 2021L, 3168L, 4960L), class = "data.frame")

readstata13::save.dta13(intdat, file="intdat.dta", convert.underscore=TRUE)

标签: rstatisticsregressionstatasurvival-analysis

解决方案


推荐阅读