r - 根据条件过滤重复项
问题描述
我有合并两个数据框的混乱结果,并想根据指定的标准来决定。
数据如下所示(仅显示重复项):
structure(list(date = structure(c(2347, 2347, 2347, 2347, 2347, 2347, 2347, 2347, 6962, 6962, 16442, 16442, 16442, 16442), class = "Date"),
country = c("United Kingdom", "United Kingdom", "United Kingdom", "United Kingdom", "United Kingdom", "United Kingdom", "United Kingdom",
"United Kingdom", "Greece", "Greece", "France", "France", "France", "France"),
city = c("Belfast", "Belfast", "Belfast", "Belfast", "Belfast", "Belfast", "Belfast", "Belfast", "Athens", "Athens", "Paris", "Paris", "Paris", "Paris"),
diff_categories = c(FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE),
diff_num1 = c(-1, -4, 0, -3, 3, 0, -1, -4, 0, 1, 0, 12, -12, 0),
diff_num2 = c(NA, NA, NA, NA, NA, NA, NA, NA, 0, 0, 1, 11, -10, 0),
df1_id = c("df1_197606050002", "df1_197606050002", "df1_197606050003", "df1_197606050003","df1_197606050004", "df1_197606050004", "df1_197606050006",
"df1_197606050006","df1_198901230001", "df1_198901230001", "df1_201501070001", "df1_201501070001","df1_201501070002", "df1_201501070002"),
df2_id = c("df2_101", "df2_102", "df2_101", "df2_102", "df2_101", "df2_102", "df2_101", "df2_102", "df2_216", "df2_219", "df2_510", "df2_511", "df2_510", "df2_511")),
row.names = c(NA, -14L), class = c("tbl_df", "tbl", "data.frame"))
我现在只想为每个实例保留一行,df1_id
并根据以下标准决定哪一行(按降序排列;首先是最重要的):
diff_categories
一定是FALSE
diff_num1
应该尽可能小diff_num2
应该尽可能小- 保持第一。
有人能指出如何最好地实现这个逻辑吗?
解决方案
这会起作用吗:
library(dplyr)
df %>%
group_by(df1_id) %>%
filter(diff_categories == TRUE & diff_num1 == min(diff_num1) & diff_num2 == min(diff_num2))
推荐阅读
- arrays - 不太明白这个leetcode问题
- c++ - 如何使用 Bazel 测试 GRPC++ 的性能?
- swift - 带有或运算符的 Swift 通用类型约束
- python - 为什么在 Python 中,对于 x=0,'not x' 为 True 而 'x' 为 False?
- linux - 如何让 systemd 在启动时运行 python 脚本
- cuda - 将数组传递到 PyCuda 卷积核中会产生意外行为
- firebase - VueRouter/Firebase:在浏览器中输入的 URL 仅在输入两次时才会呈现动态路由
- python - python的哪些数据结构用于保存用户输入数据(用户名、密码等)
- java - 如何递归加载java类及其引用类
- azure - 部署后,如何确定应用服务是 Windows 还是 Linux?