首页 > 解决方案 > R中的回归线、预测和置信区间

问题描述

我想问你,如何从中创建两个统计图:

  1. 具有预测区间的回归线
  2. 带置信区间的回归线

你使用了这个脚本,但我不知道下一步该怎么做:

pred <- lm(dta$Number.of.species ~ dta$Latitude)
pred_interval <- predict(lm(dta$Number.of.species ~ dta$Latitude), level = .99, interval = "confidence")[,2]
conf_interval <- predict(pred, newdata=dta, interval="prediction")[,3]
par(mfrow=c(2,2))
plot(
  dta$Latitude, 
  dta$Number.of.species, 
  pch = 1, 
  ylim = c(0, 180), 
  xlim = c(37, 40)
  )


plot(
  dta$Latitude, 
  dta$Number.of.species, 
  pch = 1, 
  ylim = c(0, 180), 
  xlim = c(37, 40)
) 

abline(pred)

感谢您的时间。

标签: rlinear-regressionintervalslm

解决方案


如果您只是学习 R,我会提出 2 条建议。

首先,我建议学习ggplot2包,而不是使用基本的 R 绘图系统。使用ggplot().

其次,有几个包旨在使 R 中的模型结果更容易处理。其中最突出的是broom和包的easystats集合(基于模型性能参数等)。在两者之间,我会推荐easystats

我将演示如何构建数据框以手动绘制模型并使用modelbased

手动构建数据框

library(ggplot2)

# fit the model
m <- lm(mpg ~ disp, data = mtcars)

# construct prediction and confidence intervals using predict()
m_ci <- predict(m, interval = "confidence") |> 
  as.data.frame() |> 
  setNames(c("fit", "ci_lo", "ci_hi"))
m_pi <- predict(m, interval = "prediction") |> 
  as.data.frame() |> 
  setNames(c("fit", "pi_lo", "pi_hi"))
#> Warning in predict.lm(m, interval = "prediction"): predictions on current data refer to _future_ responses

# merge the interval data frames with the data frame used in the model
m_data <- 
  merge(
    merge(
      model.frame(m), m_ci, by = "row.names"
    ),
    m_pi
  )

# make a plot using the merged model data frame
ggplot(m_data) + # use m_data in the plot
  aes(x = disp) + # put the 'disp' variable on the x axis
  geom_point(aes(y = mpg)) + # add points, put the 'mpg' variable on the y axis for these
  geom_ribbon(aes(ymin = pi_lo, ymax = pi_hi), fill = "lightblue", alpha = .4) + # add a ribbon for the prediction interval, put the pi_lo/pi_hi values on the y axis for this, color it lightblue and make it semitransparent
  geom_ribbon(aes(ymin = ci_lo, ymax = ci_hi), fill = "lightblue", alpha = .4) + # add a ribbon for the confidence interval, put the ci_lo/ci_hi values on the y axis for this, color it lightblue and make it semitransparent
  geom_line(aes(y = fit)) + # add a line for the fitted values, put the 'fit' values on the y axis
  theme_minimal() # use a white background for the plot

使用基于模型的来简化上述一些步骤

library(modelbased)

# compute intervals, including fitted values and original model matrix
ci <- estimate_expectation(m) # model fitted values and confidence intervals (uncertainty intervals on the expected values/predicted means)
pi <- estimate_prediction(m) # model fitted values and prediction intervals (uncertainty intervals on the individual predictions) 

plot(ci) + # this produces a ggplot with points, fitted line, and confidence ribbon
  geom_ribbon(aes(x = disp, ymin = CI_low, ymax = CI_high), data = pi, alpha = .4) + # add a prediction ribbon
  theme_minimal() # use a white background

以下是使用modelbased时如何修改功能区的颜色:

plot(ci, ribbon = list(fill = "lightblue")) +
  geom_ribbon(aes(x = disp, ymin = CI_low, ymax = CI_high), data = pi, fill = "lightblue", alpha = .4) +
  theme_minimal()

reprex 包于 2021-08-18 创建 (v2.0.0 )


推荐阅读