r - 删除基于多列的重复项,但通过最少的 NA 选择重复项的“最”完整版本
问题描述
我有一个看起来像这样的代码
Month| Day| Year| Color| Weather|Location|Transporation|ID
Jan Tue 2020 Blue Warm Hospital NA 1
Jan Tue 2020 Blue Warm NA NA 1
Jan Tue 2020 Blue NA NA NA 1
Feb Thu 2020 Red NA NA NA 2
Feb Thu 2020 Red Warm NA NA 2
Feb Thu 2020 Red Warm Garden Run 2
Mar Thu 2020 Red Cold Desk Bus 3
我希望它看起来像这样
Month| Day| Year| Color| Weather|Location| Transporation|ID
Jan Tue 2020 Blue Warm Hospital NA 1
Feb Thu 2020 Red Warm Garden Run 2
Mar Thu 2020 Red Cold Desk Bus 3
基本上我想通过选择三个来确定一列是否重复c(ID,Month,Color)
。一旦确定了重复项,我希望它删除具有最多 NA 或“完成最少”的那个,因为填充的列较少。
解决方案
在按感兴趣的列分组后,我们可以使用 anorder
来选择第一个非 NA 元素
library(dplyr)
dat %>%
group_by(Month, Day, Year) %>%
summarise(across(everything(), ~ first(.[order(is.na(.))])), .groups = 'drop')
-输出
# A tibble: 3 x 8
Month Day Year Color Weather Location Transporation ID
<chr> <chr> <dbl> <chr> <chr> <chr> <chr> <dbl>
1 Feb Thu 2020 Red Warm Garden Run 2
2 Jan Tue 2020 Blue Warm Hospital <NA> 1
3 Mar Thu 2020 Red Cold Desk Bus 3
数据
dat <- structure(list(Month = c("Jan", "Jan", "Jan", "Feb", "Feb", "Feb",
"Mar"), Day = c("Tue", "Tue", "Tue", "Thu", "Thu", "Thu", "Thu"
), Year = c(2020, 2020, 2020, 2020, 2020, 2020, 2020), Color = c("Blue",
"Blue", "Blue", "Red", "Red", "Red", "Red"), Weather = c("Warm",
"Warm", NA, NA, "Warm", "Warm", "Cold"), Location = c("Hospital",
NA, NA, NA, NA, "Garden", "Desk"), Transporation = c(NA, NA,
NA, NA, NA, "Run", "Bus"), ID = c(1, 1, 1, 2, 2, 2, 3)), class = "data.frame", row.names = c(NA,
-7L))
推荐阅读
- docker - Lua 错误:'null 不是对象(评估 \'document.querySelector...).click\')'
- css - 如何同时应用宽度、最小宽度和最大宽度
- pascal - 在 Pascal 中使用 TYPE 关键字
- c# - 为什么 Microsoft 分析器找不到 Microsoft.CodeAnalysis?
- javascript - 如何在输入字段中搜索特定单词
- python - 在django中将注册表单填写到数据库后如何存储用户选择的单选按钮值?
- javascript - 我如何编写承诺链
- css - Bootstrap 4 下拉按钮不会在固定高度表中弹出
- arrays - 当我尝试过滤的数组元素不存在时出现问题
- docker - 有没有办法在应用引擎中安装一次软件包以避免每次长时间部署?