r - MASS::glm.nb 包对某些数据集返回错误
问题描述
我正在尝试将特定的二项式 glm 拟合到以下数据。
testData <- structure(list(lb = c(0.102, 0.128, 0.161, 0.203, 0.256, 0.323,
0.406, 0.512, 0.645, 0.813, 1.02, 1.29, 1.63, 2.05, 2.58, 3.25,
4.1, 5.16, 6.5, 8.19, 10.3, 13, 16.4, 20.6, 26), TotalParticles = c(50612,
20541, 18851, 8058, 5606, 4123, 1995, 1234, 677, 381, 202, 111,
75, 42, 64, 118, 127, 83, 9, 2, 4, 7, 67, 8, 143), binsize = c(0.026,
0.033, 0.042, 0.053, 0.067, 0.083, 0.106, 0.133, 0.168, 0.207,
0.27, 0.34, 0.42, 0.53, 0.67, 0.85, 1.06, 1.34, 1.69, 2.11, 2.7,
3.4, 4.2, 5.4, 6), vol = c(309.76, 309.76, 309.76, 309.76, 309.76,
309.76, 309.76, 309.76, 309.76, 309.76, 309.76, 309.76, 309.76,
309.76, 309.76, 309.76, 309.76, 309.76, 309.76, 309.76, 309.76,
309.76, 309.76, 309.76, 309.76)), class = c("tbl_df", "tbl",
"data.frame"), row.names = c(NA, -25L))
我可以毫不费力地将泊松 glm 拟合到数据中。
fit_model <- function(df) glm(TotalParticles ~ log(lb), offset = log(binsize * vol), data = df, family = "poisson")
summary(fit_model(testData))
Call: glm(formula = TotalParticles ~ log(lb), family = "poisson", data = df, offset = log(binsize * vol)) Deviance Residuals: Min 1Q Median 3Q Max -42.592 -3.216 1.539 14.514 40.162 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 1.276493 0.013243 96.39 <2e-16 *** log(lb) -3.216526 0.006667 -482.44 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 1152482.7 on 24 degrees of freedom Residual deviance: 6489.9 on 23 degrees of freedom AIC: 6677.5 Number of Fisher Scoring iterations: 5
我也可以拟合一个准泊松模型。这给了我完全相同的系数,虽然稍微/更高/ p值。
fit_quasi <- function(df) glm(TotalParticles ~ log(lb), offset = log(binsize * vol), data = df, family = "quasipoisson")
summary(fit_quasi(testData))
Call: glm(formula = TotalParticles ~ log(lb), family = "quasipoisson", data = df, offset = log(binsize * vol)) Deviance Residuals: Min 1Q Median 3Q Max -42.592 -3.216 1.539 14.514 40.162 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.2765 0.9662 1.321 0.199 log(lb) -3.2165 0.4864 -6.613 9.54e-07 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for quasipoisson family taken to be 5322.668) Null deviance: 1152482.7 on 24 degrees of freedom Residual deviance: 6489.9 on 23 degrees of freedom AIC: NA Number of Fisher Scoring iterations: 5
但是,当我尝试拟合负二项式模型时,我收到了这条神秘的错误消息
fit_nb = function(df) MASS::glm.nb(TotalParticles ~ log(lb) + offset(log(vol * binsize)), data = df)
fit_nb(testData)
glm.fitter 中的错误(x = X,y = Y,w = w,etastart = eta,offset = offset,:'x' 中的 NA/NaN/Inf
traceback()
3: glm.fitter(x = X, y = Y, w = w, etastart = eta, offset = offset, family = fam, control = list(maxit = control$maxit, epsilon = control$epsilon, trace = control$trace > 1), intercept = attr(Terms, "intercept") > 0) 2: MASS::glm.nb(TotalParticles ~ log(lb) + offset(log(vol * binsize)), data = df) at #1 1: fit_nb(testData)
这个问题似乎与数据的非线性性质有关。例如,如果我将最后五个值归零,我可以进行二项式回归,但会收到警告。
testDataWorks <- testData
testDataWorks[(nrow(testDataWorks)-4):nrow(testDataWorks),"TotalParticles"] <- 0
fit_nb(testDataWorks)
glm.fit: algorithm did not converge Call: MASS::glm.nb(formula = TotalParticles ~ log(lb) + offset(log(vol * binsize)), data = df, init.theta = 1.66759784, link = log) Coefficients: (Intercept) log(lb) 1.765 -2.892 Degrees of Freedom: 24 Total (i.e. Null); 23 Residual Null Deviance: 4954 Residual Deviance: 29.95 AIC: 315.8
为什么我的二项式回归失败了?有没有我可以做的解决方法让它运行?我知道这些数据并不是特别线性,但出于某种原因,我希望能够运行模型(我有一堆类似的数据,这主要适用于他们)。
我看到我的问题类似于这个R: NA/NaN/Inf in X 错误但是,共识似乎是该用户的模型有太多参数,而在我的情况下,我预测总粒子的参数要少得多.
这篇文章https://stats.stackexchange.com/questions/52527/unable-to-fit-negative-binomial-regression-in-r-attempting-to-replicate-publish也可能是相关的,尽管我不完全是确定我将如何在实践中应用它。
感谢您的任何建议。
解决方案
推荐阅读
- pyspark - 根据 pyspark 中复杂 JSON 的 1 个字段中存在的关键字过滤行
- pytorch - 这个 LSTM 循环代码会破坏 PyTorch 中的计算图吗?
- google-cloud-functions - 将 Google Cloud Functions 中的 Webhook 与 Dialogflow CX 结合使用
- c++ - 错误 E0304;函数 std::string.insert(),显示为具有 9 个重载的重载函数 VS2019
- wpf - C# WPF 如果两个背景颜色不同的 Rectangle 完全重叠,就会出现下面这个 Rectangele 的边框
- c - 计算 C 中两个标签之间的代码大小(操作码字节数)
- sql - 在 UiPath 中执行 sql 查询以从雪花中提取数据
- javascript - NuxtJS 如何使用 emit 重新加载 asyncData
- sql - PostgreSQL 选择方法,如 excel
- angular - Angular's ngOnInit vs constructor. What to place In each?