首页 > 解决方案 > ggplot 无法使用 facet_wrap 和 group asthetic 绘制平滑的 gam

问题描述

我正在尝试使用具有群体审美的 ggplot 以及facet_wrap. 但是,geom_smooth当一组数据点太少时,分面图中的所有线都会失败。

plot1 <- ggplot(data=df1, 
                aes(x=Year, y=Mean, group=Group2, linetype=Group2, shape=Group2)) +  
  geom_errorbar(aes(ymin=Mean-SE, ymax=Mean+SE), width=0.2) +  
  geom_smooth(method = "gam", se=F, formula = y ~ s(x, k=3), size = 1, colour="black") + 
  geom_point(position=pd, size=2, fill="white") +  
  scale_x_continuous(limits=c(min(df1$Year-0.1), max(df1$Year+0.1)), 
                     breaks=seq(min(df1$Year),max(df1$Year),5)) +  
  facet_wrap(~Group1, scales = "free", ncol=2) +  
  theme_bw() + 
  theme(axis.text.x = element_text(),
        axis.title.x = element_blank(), 
        strip.background = element_blank(), 
        axis.line.x = element_line(colour="black"),
        axis.line.y = element_line(colour="black"), 
        panel.grid.minor = element_blank(), 
        panel.grid.major = element_blank(),
        panel.border = element_blank(), 
        panel.background = element_blank(),
        legend.position="top",
        legend.title = element_blank())
plot(plot1)

产生情节以下情节。这只是摘要数据,以使其更容易。就好像错误阻止了 ggplot 计算该特定方面的系列平滑。

数据

Year    Group1      Group2      Mean        SE
2011    Factor A    Factor C    30.62089116 3.672624771
2011    Factor A    Factor D    54.99066324 2.822405771
2011    Factor B    Factor C    30.48859003 3.748388489
2011    Factor B    Factor D    45.70410611 4.284244405
2017    Factor A    Factor C    33.68256601 4.030964172
2017    Factor A    Factor D    53.43496462 4.687042033
2017    Factor B    Factor C    23.08799875 5.17753488
2001    Factor A    Factor C    23.79166667 2.837795432
2001    Factor A    Factor D    23.75925926 3.688185081
2001    Factor B    Factor C    29.05555556 4.08597798
2001    Factor B    Factor D    28.13333333 7.877429079
2008    Factor A    Factor C    23.3        2.383624691
2008    Factor A    Factor D    28.83333333 2.750959429
2008    Factor B    Factor C    34.01666667 5.340999698

和情节

情节 1

显然,有足够的数据来绘制组 factorB 中的 factorC 线的平滑线。任何的想法?

标签: rggplot2facet-wrapgam

解决方案


我认为这相当棘手。在对当前的 GH 代码进行一些测试和阅读之后StatSmooth,我将我的发现总结如下:

观察

  1. geom_smooth()如果任何数据组对AND的观察太少,则无法在绘图面板中绘制所有平滑线;method = "gam" formula = y ~ s(x, k = 3)
  2. 如果绘图分为多个面板,则只有具有违规数据组的面板受到影响;
  3. 这不会发生formula = y ~ x(即默认公式);
  4. 对于使用默认公式的某些其他方法(例如"lm", )不会发生这种情况,但发生;"glm"method = "loess"
  5. 如果数据组只有 1 个观察值,则不会发生这种情况。

我们可以通过一些简化的代码重现上述内容:

# create sample data
n <- 30
set.seed(567)
df.1 <- data.frame( # there is only 1 observation for group == B
  x = rnorm(n), y = rnorm(n),
  group = c(rep("A", n - 1), rep("B", 1)),
  facet = sample(c("X", "Y"), size = n, replace = TRUE))    
set.seed(567)
df.2 <- data.frame( # there are 2 observations for group == B
  x = rnorm(n), y = rnorm(n),
  group = c(rep("A", n - 2), rep("B", 2)),
  facet = sample(c("X", "Y"), size = n, replace = TRUE))

# create base plot
p <- ggplot(df.2, aes(x = x, y = y, color = group)) + 
  geom_point() + theme_bw()

# problem: no smoothed line at all in the entire plot
p + geom_smooth(method = "gam", formula = y ~ s(x, k = 3))

# problem: no smoothed line in the affected panel
p + facet_wrap(~ facet) + 
  geom_smooth(method = "gam", formula = y ~ s(x, k = 3))

# no problem with default formula: smoothed lines in both facet panels
p + facet_wrap(~ facet) + geom_smooth(method = "gam")

# no problem with lm / glm, but problem with loess
p + facet_wrap(~ facet) + geom_smooth(method = "lm")
p + facet_wrap(~ facet) + geom_smooth(method = "glm")
p + facet_wrap(~ facet) + geom_smooth(method = "loess")

# no problem if there's only one observation (instead of two)
p %+% df.1 + geom_smooth(method = "gam", formula = y ~ s(x, k = 3))
p %+% df.1 + facet_wrap(~ facet) + 
  geom_smooth(method = "gam", formula = y ~ s(x, k = 3))

观察 1 和 2 的解释:

我相信问题出在StatSmooth'scompute_group函数的最后两行。第一行为映射指定的每个组调用数据帧上的模型函数(例如stats::glm, ) stats::loess,而第二行调用其中一个包装器以获取模型的平滑值(和置信区间,如果适用)。mgcv::gamaes(group = ...)stats::predict()

model <- do.call(method, c(base.args, method.args))
predictdf(model, xseq, se, level)

当参数method = "gam", formula = y ~ s(x, k = 3)用于只有 2 个观察值的数据框时,会发生以下情况:

model <- do.call(mgcv::gam,
                 args = list(formula = y ~ s(x, k = 3),
                             data = df.2 %>% filter(group == "B" & facet == "X")))

smooth.construct.tp.smooth.spec(object, dk$data, dk$knots) 中的错误:项的唯一协变量组合少于指定的最大自由度

model, 定义为 , 结果的对象do.call(...)甚至还没有被创建。最后一行代码predictdf(...)会抛出错误,因为model不存在。如果没有分面,这会影响 完成的所有计算StatSmooth,并且geom_smooth()不会接收到可用数据来在其层中创建任何几何图形。对于faceting,上述计算是针对每个 facet 分别进行的,因此只有具有问题数据的 facet 会受到影响。

观察 3 和 4 的解释:

加上上面的内容,如果我们不指定一个公式来替换默认值y ~ x,我们将从gam//获得一个有效的模型对象lmglm它可以传递给 ggplot2 的未导出predictdf函数,用于预测值的数据框:

model <- do.call(mgcv::gam, # or stats::lm, stats::glm
                 args = list(formula = y ~ x,
                             data = df.2 %>% filter(group == "B" & facet == "X")))

result <- ggplot2:::predictdf(
  model, 
  xseq = seq(-2, 1.5, length.out = 80), # pseudo range of x-axis values
  se = FALSE, level = 0.95) # default SE / level parameters

loess也会返回一个有效的对象,尽管有很多警告。但是,将其传递给predictdf将导致错误:

model <- do.call(stats::loess,
                 args = list(formula = y ~ x,
                             data = df.2 %>% filter(group == "B" & facet == "X")))

result <- ggplot2:::predictdf(
  model, 
  xseq = seq(-2, 1.5, length.out = 80), # pseudo range of x-axis values
  se = FALSE, level = 0.95) # default SE / level parameters

predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x else if (is.data.frame(newdata)) as.matrix(model.frame(delete.response( terms(object)), : 外部函数调用中的 NA/NaN/Inf (arg 5)

观察 5 的解释:

StatSmoothcompute_group函数开头如下:

if (length(unique(data$x)) < 2) {
      # Not enough data to perform fit
      return(data.frame())
    }

换句话说,如果指定组中只有 1 个观察值,则StatSmooth立即返回一个空白数据框。因此,它永远不会到达代码的后续部分来引发任何错误。

解决方法:

在查明事情偏离轨道的地方之后,我们可以对compute_group代码进行调整(参见带注释和注释掉的部分):

new.compute_group <- function(
  data, scales, method = "auto", formula = y~x, se = TRUE, n = 80, span = 0.75, 
  fullrange = FALSE, xseq = NULL, level = 0.95, method.args = list(), na.rm = FALSE) {
  if (length(unique(data$x)) < 2) return(data.frame()) 
  if (is.null(data$weight)) data$weight <- 1
  if (is.null(xseq)) {
    if (is.integer(data$x)) {
      if (fullrange) {
        xseq <- scales$x$dimension()
      } else {
        xseq <- sort(unique(data$x))
      }
    } else {
      if (fullrange) {
        range <- scales$x$dimension()
      } else {
        range <- range(data$x, na.rm = TRUE)
      }
      xseq <- seq(range[1], range[2], length.out = n)
    }
  }
  if (identical(method, "loess")) method.args$span <- span 
  if (is.character(method)) method <- match.fun(method)
  base.args <- list(quote(formula), data = quote(data), weights = quote(weight))

  # if modelling fails, return empty data frame
  # model <- do.call(method, c(base.args, method.args))
  model <- try(do.call(method, c(base.args, method.args)))
  if(inherits(model, "try-error")) return(data.frame())

  # if modelling didn't fail, but prediction returns NA,
  # also return empty data frame
  # predictdf(model, xseq, se, level)
  pred <- try(ggplot2:::predictdf(model, xseq, se, level))
  if(inherits(pred, "try-error")) return(data.frame())
  return(pred)
}

定义一个使用此版本的新统计层:

# same as stat_smooth() except that it uses stat = StatSmooth2, rather 
# than StatSmooth
stat_smooth_local <- function(
  mapping = NULL, data = NULL, geom = "smooth", position = "identity", ...,
  method = "auto", formula = y ~ x, se = TRUE, n = 80, span = 0.75,
  fullrange = FALSE, level = 0.95, method.args = list(), na.rm = FALSE,
  show.legend = NA, inherit.aes = TRUE) {
  layer(
    data = data, mapping = mapping, stat = StatSmooth2,
    geom = geom, position = position, show.legend = show.legend,
    inherit.aes = inherit.aes,
    params = list(
      method = method, formula = formula, se = se, n = n,
      fullrange = fullrange, level = level, na.rm = na.rm,
      method.args = method.args, span = span, ...
    )
  )
}

# inherit from StatSmooth
StatSmooth2 <- ggproto(
  "StatSmooth2", ggplot2::StatSmooth,
  compute_group = new.compute_group
)

结果:

我们可以运行与以前相同的情况,替换geom_smooth()stat_smooth_local()& 验证平滑的几何图层在每种情况下都是可见的(请注意,有些仍会导致错误消息):

# problem resolved: smoothed line for applicable group in the entire plot
p + stat_smooth_local(method = "gam", formula = y ~ s(x, k = 3))

# problem resolved: smoothed line for applicable group in the affected panel
p + facet_wrap(~ facet) + 
  stat_smooth_local(method = "gam", formula = y ~ s(x, k = 3))

# still no problem with default formula
p + facet_wrap(~ facet) + stat_smooth_local(method = "gam")

# still no problem with lm / glm; problem resolved for loess
p + facet_wrap(~ facet) + stat_smooth_local(method = "lm")
p + facet_wrap(~ facet) + stat_smooth_local(method = "glm")
p + facet_grid(~ facet) + stat_smooth_local(method = "loess")

# still no problem if there's only one observation (instead of two)
p %+% df.1 + stat_smooth_local(method = "gam", formula = y ~ s(x, k = 3))
p %+% df.1 + facet_wrap(~ facet) + 
  stat_smooth_local(method = "gam", formula = y ~ s(x, k = 3))

# showing one pair of contrasts here
cowplot::plot_grid(
  p + facet_wrap(~ facet) + ggtitle("Before") +
    geom_smooth(method = "gam", formula = y ~ s(x, k = 3)),
  p + facet_wrap(~ facet) + ggtitle("After") +
    stat_smooth_local(method = "gam", formula = y ~ s(x, k = 3)),
  nrow = 2
)

阴谋


推荐阅读