首页 > 解决方案 > 如何根据特定条件删除行并连接两个数据框?

问题描述

这是一组 2 个数据帧。

id <- c(1,2,3,4)
id2 <- c(5,6,7,8)
list <- c("list1","list2","list3","list4")
progress <- c("A", "A", "B", "C")
grade <- c("A", NA, "B", "C")
df1 <- data.frame(id, id2, list, progress, grade)
df1

id <- c(1,2,3,5)
id2 <- c(5,6,7,9)
list <- c("list1","list2","list5","list6")
progress <- c("B", "B", "A", "D")
grade2 <- c("B", NA, "B", "D")
df2 <- data.frame(id, id2, list, progress, grade2)
df2

我希望以这样的方式结合df1起来,df2

a) 对于 列 list,如果 和 的值重复idid2则 的对应值也list应该匹配。否则应返回该值NA。此条件不适用于 和 的唯一idid2

b) 对于列progress,如果 和 的值重复idid2则必须取第一次出现的值。

c) 对于列gradeand grade2,如果 and 的值重复idid2那么NA在这种情况下必须删除。

预期输出如下:-

   #id id2  list   progress grade grade2
   #1   5   list1        A     A   B
   #2   6   list2        A    NA   NA
   #3   7   NA           B     B   B
   #4   8   list4        C     C   NA
   #5   9   list6        D    NA   D

标签: rdataframedplyr

解决方案


由于您的初始数据结构,这个答案相当复杂,但这是我使用以下工具的解决方案dplyr

library(dplyr)
# Bind the rows of the two dataframes together
bind_rows(df1, df2) %>%
    # a) For each pair of id and id2...
    group_by(id, id2) %>% 
    # ...when there is more than one list, set to NA, otherwise, take the value
    mutate(list = case_when(length(unique(list)) > 1 ~ NA_character_, 
                            TRUE ~ unique(list))) %>% 
    # b) Take the first occurring progress value (still for each id, id2 pair)
    mutate(progress = progress[1]) %>% 
    ungroup() %>% 
    # Keep distinct pairs
    distinct(id, id2, list, progress) %>% 
    # c)
    # Create a smaller data set of the non-NA grade for the id, id2 pairs
    # Joint it onto the larger data set
    left_join(
        bind_rows(df1, df2) %>% 
        select(id, id2, grade) %>% 
        na.omit(),
        by = c("id", "id2")
    ) %>% 
    # c continued)
    # Create a smaller data set of the non-NA grade2 for the id, id2 pairs
    # Joint it onto the larger data set
    left_join(
        bind_rows(df1, df2) %>% 
        select(id, id2, grade2) %>% 
        na.omit(),
        by = c("id", "id2")
    )   

推荐阅读