r - 如何将数据集与 ggplot2 geom_density() 进行比较
问题描述
这是我之前提出的问题的扩展:
How to extract the density value from ggplot in r
这个水果数据集实际上是 A 国的数据,现在我有 B 国的另一个数据集。我想比较它们的值。但是,A 国和 B 国水果苹果的密度图(y 轴)不同,A 国的最高密度约为 0.8,B 国的最高密度约为 0.4。
Q. B 国也有类似的曲线,但其 y 轴的最高密度值仅为 0.4。那么我该如何比较它们呢?
最小示例的代码:
library(ggplot2)
set.seed(1234)
df = data.frame(
fruits = factor(rep(c("Orange", "Apple", "Pears", "Banana"), each = 200)),
weight = round(c(rnorm(200, mean = 55, sd=5),
rnorm(200, mean=65, sd=5),
rnorm(200, mean=70, sd=5),
rnorm(200, mean=75, sd=5)))
)
dim(df) #[1] 800 2
ggplot(df, aes(x = weight)) +
geom_density() +
facet_grid(fruits ~ ., scales = "free", space = "free")
g = ggplot(df, aes(x = weight)) +
geom_density() +
facet_grid(fruits ~ ., scales = "free", space = "free")
p = ggplot_build(g)
sp = split(p$data[[1]][c("x", "density")], p$data[[1]]$PANEL)
apple_df = sp[[1]]
sum(apple_df$density ) # this is equal to 10.43877 but i want it to be one
解决方案
假设您有两个不同国家/地区的数据框,df_c1
并且df_c2
. 这个想法是合并两个数据框并添加一个列来区分国家
library(dplyr)
library(ggplot2)
df_c1 = data.frame(
fruits = factor(rep(c("Orange", "Apple", "Pears", "Banana"), each = 200)),
weight = round(c(rnorm(200, mean = 55, sd=5),
rnorm(200, mean=65, sd=5),
rnorm(200, mean=70, sd=5),
rnorm(200, mean=75, sd=5)))
)
df_c2 = data.frame(
fruits = factor(rep(c("Orange", "Apple", "Pears", "Banana"), each = 200)),
weight = round(c(rnorm(200, mean = 20, sd=3),
rnorm(200, mean=35, sd=6),
rnorm(200, mean=40, sd=2),
rnorm(200, mean=15, sd=4)))
)
df <- rbind(
df_c1 %>% mutate(country = "country 1"),
df_c2 %>% mutate(country = "country 2")
)
df %>%
ggplot() +
geom_density(aes(x = weight, color = country)) +
facet_grid(fruits ~ ., scales = "free", space = "free")
曲线下面积
使用分布的另一种可能性是首先使用该density
函数,然后表示这些值。
dens1 <- df_c1 %>%
group_by(fruits) %>%
summarise(x = density(weight)$x, y = density(weight)$y) %>%
mutate(country = "country 1")
dens2 <- df_c2 %>%
group_by(fruits) %>%
summarise(x = density(weight)$x, y = density(weight)$y) %>%
mutate(country = "country 2")
df_dens <- rbind(dens1, dens2)
现在ggplot
我们使用geom_line
df_dens %>%
ggplot() +
geom_line(aes(x, y, color = country)) +
facet_grid(fruits ~ ., scales = "free", space = "free")
如果要测量曲线下的面积,请定义微分。
country == "country 1
我们只选择一条曲线,例如fruits == "Apple"
df_single_curve <- df_dens %>%
filter(country == "country 1" & fruits == "Apple")
# differential
xx <- df_single_curve$x
dx <- xx[2L] - xx[1L]
yy <- df_single_curve$y
# integral
I <- sum(yy) * dx
I
# [1] 1.000965
推荐阅读
- javascript - 反应原生 - '未定义不是对象'?
- python - Python:帮助修复我秒表上的重置功能
- reactjs - Shopify 石板主题样式未加载
- c# - Unity - 将四元数转换为 Vector3
- mysql - mysql join查询ID IN
- batch-file - Windows批处理文件循环多次打印文件
- python - Python ConnectionError: ('Connection aborted.', OSError("(10060, 'WSAETIMEDOUT')"))
- c# - CultureInfo 在 linux-64 centos 的生产环境中不起作用
- java - JFreeChart 和 Hibernate 中的问题
- ruby-on-rails - Rspec Rails 测试错误:失败/错误:JSON.parse(response.body) && JSON::ParserError: 784: '' 处的意外令牌