首页 > 解决方案 > MASS::glm.nb 包对某些数据集返回错误

问题描述

我正在尝试将特定的二项式 glm 拟合到以下数据。

testData <- structure(list(lb = c(0.102, 0.128, 0.161, 0.203, 0.256, 0.323, 
0.406, 0.512, 0.645, 0.813, 1.02, 1.29, 1.63, 2.05, 2.58, 3.25, 
4.1, 5.16, 6.5, 8.19, 10.3, 13, 16.4, 20.6, 26), TotalParticles = c(50612, 
20541, 18851, 8058, 5606, 4123, 1995, 1234, 677, 381, 202, 111, 
75, 42, 64, 118, 127, 83, 9, 2, 4, 7, 67, 8, 143), binsize = c(0.026, 
0.033, 0.042, 0.053, 0.067, 0.083, 0.106, 0.133, 0.168, 0.207, 
0.27, 0.34, 0.42, 0.53, 0.67, 0.85, 1.06, 1.34, 1.69, 2.11, 2.7, 
3.4, 4.2, 5.4, 6), vol = c(309.76, 309.76, 309.76, 309.76, 309.76, 
309.76, 309.76, 309.76, 309.76, 309.76, 309.76, 309.76, 309.76, 
309.76, 309.76, 309.76, 309.76, 309.76, 309.76, 309.76, 309.76, 
309.76, 309.76, 309.76, 309.76)), class = c("tbl_df", "tbl", 
"data.frame"), row.names = c(NA, -25L))

我可以毫不费力地将泊松 glm 拟合到数据中。

fit_model <- function(df) glm(TotalParticles ~ log(lb), offset = log(binsize * vol), data = df, family = "poisson")
summary(fit_model(testData))
Call:
glm(formula = TotalParticles ~ log(lb), family = "poisson", data = df, 
    offset = log(binsize * vol))

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-42.592   -3.216    1.539   14.514   40.162  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  1.276493   0.013243   96.39   <2e-16 ***
log(lb)     -3.216526   0.006667 -482.44   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 1152482.7  on 24  degrees of freedom
Residual deviance:    6489.9  on 23  degrees of freedom
AIC: 6677.5

Number of Fisher Scoring iterations: 5

我也可以拟合一个准泊松模型。这给了我完全相同的系数,虽然稍微/更高/ p值。

fit_quasi <- function(df) glm(TotalParticles ~ log(lb), offset = log(binsize * vol), data = df, family = "quasipoisson")
summary(fit_quasi(testData))
Call:
glm(formula = TotalParticles ~ log(lb), family = "quasipoisson", 
    data = df, offset = log(binsize * vol))

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-42.592   -3.216    1.539   14.514   40.162  

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   1.2765     0.9662   1.321    0.199    
log(lb)      -3.2165     0.4864  -6.613 9.54e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for quasipoisson family taken to be 5322.668)

    Null deviance: 1152482.7  on 24  degrees of freedom
Residual deviance:    6489.9  on 23  degrees of freedom
AIC: NA

Number of Fisher Scoring iterations: 5

但是,当我尝试拟合负二项式模型时,我收到了这条神秘的错误消息

fit_nb = function(df) MASS::glm.nb(TotalParticles ~ log(lb) + offset(log(vol * binsize)), data = df)
fit_nb(testData)

glm.fitter 中的错误(x = X,y = Y,w = w,etastart = eta,offset = offset,:'x' 中的 NA/NaN/Inf

traceback()
3: glm.fitter(x = X, y = Y, w = w, etastart = eta, offset = offset, 
       family = fam, control = list(maxit = control$maxit, epsilon = control$epsilon, 
           trace = control$trace > 1), intercept = attr(Terms, "intercept") > 
           0)
2: MASS::glm.nb(TotalParticles ~ log(lb) + offset(log(vol * binsize)), 
       data = df) at #1
1: fit_nb(testData)

这个问题似乎与数据的非线性性质有关。例如,如果我将最后五个值归零,我可以进行二项式回归,但会收到警告。

testDataWorks <- testData
testDataWorks[(nrow(testDataWorks)-4):nrow(testDataWorks),"TotalParticles"] <- 0
fit_nb(testDataWorks)
glm.fit: algorithm did not converge
Call:  MASS::glm.nb(formula = TotalParticles ~ log(lb) + offset(log(vol * 
    binsize)), data = df, init.theta = 1.66759784, link = log)

Coefficients:
(Intercept)      log(lb)  
      1.765       -2.892  

Degrees of Freedom: 24 Total (i.e. Null);  23 Residual
Null Deviance:        4954 
Residual Deviance: 29.95  AIC: 315.8

为什么我的二项式回归失败了?有没有我可以做的解决方法让它运行?我知道这些数据并不是特别线性,但出于某种原因,我希望能够运行模型(我有一堆类似的数据,这主要适用于他们)。

我看到我的问题类似于这个R: NA/NaN/Inf in X 错误但是,共识似乎是该用户的模型有太多参数,而在我的情况下,我预测总粒子的参数要少得多.

这篇文章https://stats.stackexchange.com/questions/52527/unable-to-fit-negative-binomial-regression-in-r-attempting-to-replicate-publish也可能是相关的,尽管我不完全是确定我将如何在实践中应用它。

感谢您的任何建议。

标签: rglm

解决方案


推荐阅读