r - How to subset with ggplot based on facet aggregates?
问题描述
I have a dataset with 16 groups available for facets—however, that is too many, and I'd like to keep only the most important groups (determined by what percentage of a certain total falls in that group). For example, I'd like to keep only groups that represent 30% or more of the total of Var1.
To illustrate, if I run the following code, R correctly outputs the two species whose Petal.length sum represents more than 30% of the total Petal.length in the dataset (ignore that it's a meaningless statistic in this case).
library(tidyverse)
iris %>%
group_by(Species) %>%
summarise(t_length = sum(Petal.Length),
p_length = round(100*t_length/sum(.$Petal.Length))) %>%
filter(p_length >=30)
So, what I'd like to do is have ggplot facet by all groups that meet the specified condition. In my dataset, only 5 out of the 16 groups capture over 90% of the interesting observations, so, I don't need the other 11 groups in the facet grid.
This is my attempt, and the output is all 3 species, where it should only be the same 2 from the table above:
iris.sub <- ggplot(subset(iris, round(100*sum(Petal.Length)/sum(iris$Petal.Length)) >= 30), aes(x = ' ', y = Petal.Length)) +
geom_point(stat = 'summary', fun.y = 'mean') +
geom_errorbar(stat = 'summary', fun.data = 'mean_se',
width=0, fun.args = list(mult = 1.96)) +
facet_grid( . ~ Species ) +
theme_bw()
iris.sub
解决方案
filter
不会受到影响group_by
。例如,如果您有一个按列分组的数据框,var1
并且您想过滤列x
> 50 的行,则观察值在某个组中的事实不会影响数字是否存在这一事实大于 50。
dplyr
以下是使用某些功能的两种方法。第一个计算每个组对总花瓣长度的贡献,提取这些物种,并将其保留为向量。然后,您过滤数据框以仅对其中一种物种进行观察,然后进行绘图。
第二个在一个块中完成这些计算和绘图。这样做的好处是您不必为您所保留的物种保存变量;缺点是在mutate
调用中做总结数学而不是summarise
混乱,如果你不小心你需要加起来的确切内容(根据经验说),可能会导致错误。
library(tidyverse)
major_categories <- iris %>%
group_by(Species) %>%
summarise(group_Petal.Length = sum(Petal.Length)) %>%
mutate(share_Petal.Length = group_Petal.Length / sum(group_Petal.Length)) %>%
filter(share_Petal.Length >= 0.3) %>%
pull(Species)
iris %>%
filter(Species %in% major_categories) %>%
ggplot(aes(x = 1, y = Petal.Length)) +
geom_point(stat = "summary", fun.y = "mean") +
geom_errorbar(stat = "summary", fun.data = "mean_se", width = 0, fun.args = list(mult = 1.96)) +
facet_grid(. ~ Species) +
theme_bw()
iris %>%
group_by(Species) %>%
mutate(group_Petal.Length = sum(Petal.Length)) %>%
ungroup() %>%
mutate(share_Petal.Length = group_Petal.Length / sum(unique(group_Petal.Length))) %>%
filter(share_Petal.Length >= 0.3) %>%
ggplot(aes(x = 1, y = Petal.Length)) +
geom_point(stat = "summary", fun.y = "mean") +
geom_errorbar(stat = "summary", fun.data = "mean_se", width = 0, fun.args = list(mult = 1.96)) +
facet_grid(. ~ Species) +
theme_bw()
还要注意的是,如果您在 x 轴上没有任何值(这里只是一个虚拟值),您不妨跳过刻面并放在Species
x 轴上。不确定这是否仍适用于您更大的数据集。
推荐阅读
- c# - 如何使用预先计算的签名在 c# 中创建 CADES-BES PKCS#7 签名消息?
- python - Python 更改电子邮件附件名称和扩展名
- asp.net-web-api - 来自服务的操作/控制器 URL
- r - R控制台捕获对象中的错误编码(西里尔编码)
- php - Wordpress - 自定义电子邮件发件人姓名被标记为未经身份验证
- angular - 使用 Angular6 无法正确获取日期选择器视图
- javascript - Ajax 调用动态加载的行
- github - Github 工作流不会在推送基于路径的过滤时触发
- python - os._exit(1) 或 os._exit(0) 不允许将日志推送到 AWS cloudwatch
- c# - 如何使用 Itext 7-dotnet 像 Acrobat 一样标记背景?