r - 直方图如何将 y 轴从计数更改为频率并跨两个数据集进行标准化
问题描述
我有两个数据集显示鱼的长度,并想创建并排直方图来比较数据。我遇到的问题是缩放 y 轴和 bin 大小,以便它们具有可比性。我想使用数据的 %frequency 而不是计数。当它们来自两个不同的来源时,我也遇到了将它们并排绘制的问题。您可以使用 facet_grid 或 facet_wrap 来执行此操作吗?
任何帮助将非常感激!
编辑
我使用了这段代码,它只是给出了一个带有计数的基本直方图..
ggplot(snook, aes(sl)) +geom_histogram(binwidth = 20, color="black", fill= "light blue")+
ggtitle("All Snook")+
labs(x="Standard Length(mm)", y="Counts")+
theme_bw() + theme(panel.border = element_blank(), panel.grid.major = element_blank(),
panel.grid.minor = element_blank(), axis.line = element_line(colour = "black"))
下面是使用下面 SimeonL 提供的代码的结果
opar <- par(mfrow = c(1,2))
hist(snook$sl, breaks = seq(0, 1000, length = 50), freq = T, main = "All Snook", xlab = "Length (mm)", ylim = c(0, 50), las = 1)
hist(gut_Snook$SL, breaks = seq(0, 1000, length = 50), freq = T, main = "Culled Snook", xlab = "Length (mm)", ylim = c(0, 50), las = 1)
par(opar)
这很接近,但看起来它仍在使用 y 轴的计数而不是 % 频率。
解决方案
基础 R 中的两个选项:
- 使用 hist 并更改 y 轴标签以匹配百分比:
set.seed(23)
df1 <- data.frame(f_size = rnorm(120, 20, 15))
x.1 <- approxfun(c(0, 100), c(0, nrow(df1)))
df2 <- data.frame(f_size = rnorm(70, 5, 5))
x.2 <- approxfun(c(0, 100), c(0, nrow(df2)))
opar <- par(mfrow = c(1,2))
hist(df1$f_size, breaks = seq(-20, 70, length = 40), freq = T, main = "", xlab = "df1_size",
ylim = x.1(c(0, 25)), las = 1, yaxt = "n", ylab = "% Cases")
axis(2, at = x.1(seq(0, 25, 5)), labels = seq(0, 25, 5), las = 1)
hist(df2$f_size, breaks = seq(-20, 70, length = 40), freq = T, main = "", xlab = "df2_size",
ylim = x.2(c(0, 25)), las = 1, yaxt = "n", ylab = "")
axis(2, at = x.2(seq(0, 25, 5)), labels = seq(0, 25, 5), las = 1)
par(opar)
- 首先计算百分比并使用条形图:
breaks <- seq(-20, 70, length = 40)
df1.perc <- aggregate(df1$f_size, by = list(cut(df1$f_size, breaks, labels = F)), FUN = function(x) (length(x)/nrow(df1))*100)
df2.perc <- aggregate(df2$f_size, by = list(cut(df2$f_size, breaks, labels = F)), FUN = function(x) (length(x)/nrow(df2))*100)
opar <- par(mfrow = c(1,2))
bp <- barplot(height = merge(data.frame(Group.1 = 1:length(breaks)), df1.perc, all.x = T)$x,
xlab = "df1_size", ylab = "% Cases", ylim = c(0, 25), las = 1)
axis(1, at = approx(breaks, bp, xout = seq(-40, 70, by = 10))$y, labels = seq(-40, 70, by = 10))
bp <- barplot(height = merge(data.frame(Group.1 = 1:length(breaks)), df2.perc, all.x = T)$x,
xlab = "df1_size", ylab = "", ylim = c(0, 25), las = 1)
axis(1, at = approx(breaks, bp, xout = seq(-40, 70, by = 10))$y, labels = seq(-40, 70, by = 10))
推荐阅读
- java - UnsatisfiedDependencyException:在 2.4.1 Spring Boot 中使用名称创建 bean 时出错
- visual-studio - 哪个架构允许 VS appsettings.json 的尾随逗号?
- java - 如何处理微服务中的数据库故障?
- amazon-web-services - 结对编程 - 无法访问互联网
- salesforce - Azure 数据工厂:将 Salesforce 数据动态增量加载到 Azure SQL 数据库
- json - 如何在bash shell的数组中转义星号'*'
- sql-server - 关于用于 SQL 恢复的 Systools
- r - R中ExtractVars中的无效模型公式无缘无故
- java - Google Recaptcha 的 SSLHandshakeException
- php - 如何确定我的号码是否有 4 个小数位