首页 > 解决方案 > R geom_histogram position="identity" 不一致

问题描述

我目前在 R 中工作,试图创建一个图面板,每个图都包含两个重叠的直方图:蓝色直方图下方的红色直方图。红色直方图在每个图中包含相同的数据集,因此应该在整个板上一致地显示。我发现事实并非如此。尽管每个图中的数据完全相同,但红色直方图有所不同。有没有办法解决这个问题?我的代码中是否遗漏了导致这种不一致的内容?

这是我用来创建图的代码:

  test<-rnorm(1000)
  test<-as.data.table(test)
  test[, type:="Sample"]
  setnames(test, old="test", new="value")
  
  test_2<-rnorm(750)
  test_2<-as.data.table(test_2)
  test_2[, type:="Sub Sample"]
  setnames(test_2, old="test_2", new="value")
  test_2_final<-rbind(test, test_2, fill=TRUE)
  
  
  test_3<-rnorm(500)
  test_3<-as.data.table(test_3)
  test_3[, type:="Sub Sample"]
  setnames(test_3, old="test_3", new="value")
  test_3_final<-rbind(test, test_3, fill=TRUE)
  
  test_4<-rnorm(250)
  test_4<-as.data.table(test_4)
  test_4[, type:="Sub Sample"]
  setnames(test_4, old="test_4", new="value")
  test_4_final<-rbind(test, test_4, fill=TRUE)
  
  test_5<-rnorm(100)
  test_5<-as.data.table(test_5)
  test_5[, type:="Sub Sample"]
  setnames(test_5, old="test_5", new="value")
  test_5_final<-rbind(test, test_5, fill=TRUE)
  
  test_6<-rnorm(50)
  test_6<-as.data.table(test_6)
  test_6[, type:="Sub Sample"]
  setnames(test_6, old="test_6", new="value")
  test_6_final<-rbind(test, test_6, fill=TRUE)
  
  draws_750_p<-ggplot(data = test_2_final, aes(x=value, fill=type, color=type)) + geom_histogram(position="identity", alpha = 0.2, bins=30) + theme(plot.title = element_text(hjust = 0.5, size=10, face="plain"))
  draws_500_p<-ggplot(data = test_3_final, aes(x=value, fill=type, color=type)) + geom_histogram(position="identity", alpha = 0.2, bins=30) + theme(plot.title = element_text(hjust = 0.5, size=10, face="plain"))
  draws_250_p<-ggplot(data = test_4_final, aes(x=value, fill=type, color=type)) + geom_histogram(position="identity", alpha = 0.2, bins=30) + theme(plot.title = element_text(hjust = 0.5, size=10, face="plain"))
  draws_100_p<-ggplot(data = test_5_final, aes(x=value, fill=type, color=type)) + geom_histogram(position="identity", alpha = 0.2, bins=30) + theme(plot.title = element_text(hjust = 0.5, size=10, face="plain"))
  draws_50_p<-ggplot(data = test_6_final, aes(x=value, fill=type, color=type)) + geom_histogram(position="identity", alpha = 0.2, bins=30) + theme(plot.title = element_text(hjust = 0.5, size=10, face="plain"))
  
  
  full_plot<-plot_grid(draws_750_p, draws_500_p, draws_250_p, draws_100_p, draws_50_p, ncol = 3, nrow = 2)

这是我正在描述的奇怪结果的图片:注意红色直方图的分布如何不同,尽管每个集合中的数据集完全相同(在此示例中,您可以在右侧的 draws_250_p 图中看到最多手角)-

enter image description here

标签: rggplot2histogram

解决方案


As I mentioned in a comment, the issue is that the bins being used are different for each plot. This means the same value can end up in a different bin. the default is to guess at reasonable bin boundaries based on the number of bins specified and the range of the data, but since the sub samples are different in each plot (and may start earlier or later than the main sample) the resulting boundaries will be different.

The solution is to specify the bin boundaries directly so they are the same in every plot. Here is an example of specifying the bin boundaries implicitly using a combination of binwidth and boundary. I have also taken the liberty of combining all of the values into a single dataframe so that they can be plotted at once using facet_wrap, which has the advantage of aligning the axes of the individual facets and labelling them with the size of the subsample. The crucial point is in the call to geom_histogram, though. You can hopefully see that the red distributions are the same in each facet now.

library(tidyverse)

test <- tibble(type = "Sample", value = rnorm(1000))

add_sub_sample <- function(n, df) {
  sub_sample <- tibble(type = "Sub Sample", value = rnorm(n))
  df %>%
    rbind(sub_sample) %>%
    mutate(sub_sample_n = n)
}

test_final <- c(750, 500, 250, 100, 50) %>%
  map(add_sub_sample, test) %>%
  bind_rows()

ggplot(test_final, aes(x = value, fill = type, colour = type)) +
  geom_histogram(position = "identity", alpha = 0.2, binwidth = 0.2, boundary = 0) +
  facet_wrap(~sub_sample_n) +
  theme(plot.title = element_text(hjust = 0.5, size=10, face="plain"))

Created on 2021-07-14 by the reprex package (v1.0.0)


推荐阅读