r - Dplyr 根据条件汇总多列
问题描述
我有一个这样的数据集:
df.in <-structure(list(id = c(1, 1, 2, 3), x1 = c(0, 1, NA, 0), x2 = c("Lorem ipsum dolor sit amet",
"dolore eu fugiat nulla pariatur", "Sed ut perspiciatis unde omnis",
"Nemo enim ipsam voluptatem"), x3 = c("Donec ullamcorper elit quis risus",
"Donec ullamcorper elit quis risus", "Curabitur euismod", "Mauris felis orci"
)), row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"
))
> df.in
# A tibble: 4 x 4
id x1 x2 x3
<dbl> <dbl> <chr> <chr>
1 1 0 Lorem ipsum dolor sit amet Donec ullamcorper elit quis risus
2 1 1 dolore eu fugiat nulla pariatur Donec ullamcorper elit quis risus
3 2 NA Sed ut perspiciatis unde omnis Curabitur euismod
4 3 0 Nemo enim ipsam voluptatem Mauris felis orci
我试图dplyr::group_by()
获得这个:
df.out <- structure(list(id = c(1, 2, 3), x1 = c(1, NA, 0), x2 = c("dolore eu fugiat nulla pariatur",
"Sed ut perspiciatis unde omnis", "Nemo enim ipsam voluptatem"
), x3 = c("Donec ullamcorper elit quis risus", "Curabitur euismod",
"Mauris felis orci")), row.names = c(NA, -3L), class = c("tbl_df",
"tbl", "data.frame"))
> df.out
# A tibble: 3 x 4
id x1 x2 x3
<dbl> <dbl> <chr> <chr>
1 1 1 dolore eu fugiat nulla pariatur Donec ullamcorper elit quis risus
2 2 NA Sed ut perspiciatis unde omnis Curabitur euismod
3 3 0 Nemo enim ipsam voluptatem Mauris felis orci
我可以:
df.in %>%
group_by(id) %>%
summarise(x1 = max(x1))
但是,我该如何:
- 总结一下
x2
,x3
保持值在哪里max(x1)
出现? - 我有几个
x
都需要相同的逻辑。有没有办法做一个summarize_all
?
解决方案
max
我们可以用in创建一个条件summarise_at
library(dplyr)
df.in %>%
group_by(id) %>%
summarise_at(3:4, funs(if(n() == 1) . else .[x1 == max(x1, na.rm = TRUE)]))
除了使用summarise_at
,我们也可以使用filter
orslice
df.in %>%
group_by(id) %>%
filter((n() == 1) | (x1 == max(x1, na.rm = TRUE)))
或使用slice
df.in %>%
group_by(id) %>%
slice(which(n() == 1 | (x1 == max(x1, na.rm = TRUE)))[1])
推荐阅读
- r - 添加列名作为特定列的前缀
- c++11 - C++ 移动赋值运算符
- ios - 如何减少在 AR 中查看 .usdz 对象所需的内存?
- mongodb - MongoDB 从多个模型/模式查询并在一个字段中返回
- php - 如何替换替换 mcrypt_encrypt?
- cakephp - 插件 cakephp-imagine-plugin 实现错误
- python - 如何在 Python 中为海龟添加位置条件?
- sql - 具有聚合函数的复杂 SQL 查询
- terraform - 尝试使 Terraform 与 IBM Cloud 一起使用时出错
- python - Python“导入请求”返回 500 内部服务器错误