r - Linear regression: calculate confidence and prediction intervals with the standard errors of the fitted parameters the correlation coefficient
问题描述
In many fields of the natural sciences it is common practice to report the results of linear regression analysis as y = (a1 +- u(a1)) + (a2 +- u(a2)) * x
, including R2 and p, but not the original data. u(a1) and u(a2) are the uncertainties (standard error) of a1 and a2. How could I calculate with this information the confidence and prediction intervals, or have a “reasonable” estimation?
Let me clarify with an example. This is a dummy dataset, with a line of slope 1 and gaussian noise 10:
set.seed(1961)
npoints <- 1e2
(x <- 1:npoints)
(y <-1:npoints + rnorm(npoints, 0, npoints/10))
Now I perform the linear regression:
par(mar = c(4, 4, 1, 1))
xy.model <- lm(y ~ x)
plot(x, y, pch = 16)
abline(xy.model, col = "orange", lwd = 2)
(xy.sum <- summary(xy.model))
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) -1.28106 1.94918 -0.657 0.513
# x 1.00484 0.03351 29.987 <2e-16 ***
# Residual standard error: 9.673 on 98 degrees of freedom
# Multiple R-squared: 0.9017, Adjusted R-squared: 0.9007
# F-statistic: 899.2 on 1 and 98 DF, p-value: < 2.2e-16
I calculate the confidence and prediction intervals:
x.new <- data.frame(x = 1:npoints)
xy.conf <- predict.lm(xy.model, se.fit = TRUE, interval = "confidence", newdata = x.new)
xy.pred <- predict.lm(xy.model, se.fit = TRUE, interval = "prediction", newdata = x.new)
For example, the confidence and prediction intervals of the first point are:
xy.conf$fit[1, ]
# fit lwr upr
# -0.2762127 -4.0867009 3.5342755
xy.pred$fit[1, ]
# fit lwr upr
# -0.2762127 -19.8462821 19.2938568
If the regression equation is reported as y = (-1.28106 +- 1.94918) + (1.00484 +- 0.03351) * x, R2 = 0.9017, p < 0.05, but the original data are not provided, how can I reproduce (at least approximately) the values, of the confidence and prediction intervals?
解决方案
如果没有原始数据,您还需要一条信息:两个变量的均值。您提供的统计数据允许构建线性回归线,但置信区间和预测区间在均值 (x)、均值 (y) 处最窄,因此如果没有这些,您将无法计算它们。
一个简单的例子可能会更清楚地说明这一点。从一些数据开始:
z <- structure(list(x = c(5, 5.1, 5.4, 5.8, 4.7, 5.7, 4.8, 5.1, 4.6,
5.4, 5.2, 5, 5, 5.5, 5.2, 5.1, 4.7, 5.2, 4.8, 5.4, 4.8, 5.1,
5, 4.6, 4.8), y = c(3.4, 3.7, 3.4, 4, 3.2, 3.8, 3, 3.5, 3.1,
3.7, 4.1, 3.4, 3.6, 4.2, 3.5, 3.3, 3.2, 3.4, 3, 3.9, 3.1, 3.5,
3.5, 3.4, 3.4)), row.names = c(NA, -25L), class = "data.frame")
计算回归线并将其与数据一起绘制:
z.lm <- lm(y~x, z)
z.lm
#
# Call:
# lm(formula = y ~ x, data = z)
#
# Coefficients:
# (Intercept) x
# -0.4510 0.7762
#
plot(y~x, z, xlim=c(0, 20), ylim=c(0, 20))
abline(z.lm)
现在从原始数据创建一个新数据集并计算回归:
x2 <- z$x + 10
y2 <- z$y+(10 * coef(z.lm)[2])
z2 <- data.frame(x=x2, y=y2)
points(y~x, z2, col="red")
z2.lm <- lm(y~x, z2)
z2.lm
#
# Call:
# lm(formula = y ~ x, data = z2)
#
# Coefficients:
# (Intercept) x
# -0.4510 0.7762
请注意,回归系数与原始数据相同。事实上,将 10 更改为任何其他值都会产生另一组具有相同回归结果的数据。
推荐阅读
- node.js - How to optionally set Winston transports when logging to Stackdriver from GKE
- shell - Cygwin64 终端中缺少文件夹
- javascript - 尝试在 nuxt 应用程序中初始化 firebase 时无法读取 null 的属性“firestore”
- asynchronous - How to capture removed item in AsyncTypeahead when removing item from the list
- ios - Swift observe(_:options:changeHandler:) newValue 总是 nil
- java - Android通过反射覆盖私有方法
- google-oauth - 谷歌 oauth API 响应是否针对电子邮件范围进行了更改?电子邮件类型已从“帐户”更改为“帐户”
- sugarcrm - 如何阻止用户在默认保存时输入重复记录
- java - 将a.class文件添加到war文件中的库jar文件中
- javascript - Chrome 扩展:通过内容脚本修改 DOM 元素的问题