首页 > 解决方案 > 根据条件过滤重复项

问题描述

我有合并两个数据框的混乱结果,并想根据指定的标准来决定。

数据如下所示(仅显示重复项):

structure(list(date = structure(c(2347, 2347, 2347, 2347, 2347, 2347, 2347, 2347, 6962, 6962, 16442, 16442, 16442, 16442), class = "Date"),
               country = c("United Kingdom", "United Kingdom", "United Kingdom", "United Kingdom", "United Kingdom", "United Kingdom", "United Kingdom", 
                           "United Kingdom", "Greece", "Greece", "France", "France", "France", "France"), 
               city = c("Belfast", "Belfast", "Belfast", "Belfast", "Belfast", "Belfast", "Belfast", "Belfast", "Athens",  "Athens", "Paris", "Paris", "Paris", "Paris"), 
               diff_categories = c(FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE), 
               diff_num1 = c(-1, -4, 0, -3, 3, 0, -1, -4, 0, 1, 0, 12, -12, 0), 
               diff_num2 = c(NA, NA, NA, NA, NA, NA, NA, NA, 0, 0, 1, 11, -10, 0), 
               df1_id = c("df1_197606050002", "df1_197606050002", "df1_197606050003", "df1_197606050003","df1_197606050004", "df1_197606050004", "df1_197606050006", 
                          "df1_197606050006","df1_198901230001", "df1_198901230001", "df1_201501070001", "df1_201501070001","df1_201501070002", "df1_201501070002"),
               df2_id = c("df2_101", "df2_102", "df2_101", "df2_102", "df2_101", "df2_102", "df2_101", "df2_102", "df2_216", "df2_219", "df2_510",  "df2_511",  "df2_510", "df2_511")), 
          row.names = c(NA, -14L), class = c("tbl_df", "tbl", "data.frame"))

我现在只想为每个实例保留一行,df1_id并根据以下标准决定哪一行(按降序排列;首先是最重要的):

有人能指出如何最好地实现这个逻辑吗?

标签: rtidyverse

解决方案


这会起作用吗:

library(dplyr)
df %>% 
   group_by(df1_id) %>% 
       filter(diff_categories == TRUE & diff_num1 == min(diff_num1) & diff_num2 == min(diff_num2))

推荐阅读