首页 > 解决方案 > 在 group_by、do() 和更多链/多个条件之后的 dplyr 中的 if 语句

问题描述

我希望你们一切都好。

我有一个包含许多列的数据集,我正在尝试根据多个条件删除重复的基础。下面我提供一个例子来演示我的问题。这个想法是,对于每个 ID,所有列都会被检查,如果所有列都相同,则保留最新的列。如果有两个相同的行并且上面的注释不同,则检查该行是否为“为向下/升级客户端添加注释”,如果所有行都具有相同的注释,则保留第一行,否则保留最新的行而没有上述注释。

我一直在尝试以下

##dataframe
             ID <- c("H1", "H1"," H1"," H2", "H2", "H3", "H3"," H3", "H4")
            rating <-c("C", "C", "C+","D", "C", "C",  "C+", "C+", "C")
            Commnets<- c("Add comment for down/upgrading client", "updated", "Add comment for down/upgrading client","Add comment for down/upgrading client","Add comment for down/upgrading client", 
                        "down",  "down", "Add comment for down/upgrading client", "Add comment for down/upgrading client")
            Date<- c("2018-12-10", "2018-12-10", "2018-11-10",
                        "2018-11-10","2018-11-10", 
                        "2018-10-10",  "2018-10-02", "2018-10-02", "2020-09-03")
 df<-data.frame(ID,rating,Commnets,Date,stringsAsFactors=FALSE)






 df$Date<-as.Date(df$Date)
    df<-df%>%
      group_by(ID,rating,Date)%>%
      arrange(desc(Date)) %>% # in each group, arrange in desc by Date
      filter(row_number() == 1)#this will solve the first problem 



  
   



df$Date<-as.Date(df$Date)
        df<-df%>%
          group_by(ID,rating,Date)%>%
          arrange(desc(Date)) %>% #I think that I need **do** here but not sure how
ifelse(rowSums("Add comment for down/upgrading client" == $Comments)==length($Comments),
                  filter(row_number() == 1),rowSums("Add comment for down/upgrading client" == $Comments)[1,])
  
 

   

    

标签: r

解决方案


您可以arrange通过降序 和计算每个和Date的唯一数来计算 数据。如果它在整个评论中都是相同的,请选择第一行,如果它不同于选择最后一个,即最新的一个。CommnetsIDratingDate

library(dplyr)

df %>%
  mutate(ID = trimws(ID), 
         Date = as.Date(Date)) %>%
  arrange(ID, rating, Commnets, desc(Date)) %>%
  group_by(ID,rating,Date)  %>%
  slice(if(n_distinct(Commnets) == 1) 1L else n())

#  ID    rating Commnets                              Date      
#  <chr> <chr>  <chr>                                 <date>    
#1 H1    C      updated                               2018-12-10
#2 H1    C+     Add comment for down/upgrading client 2018-11-10
#3 H2    C      Add comment for down/upgrading client 2018-11-10
#4 H2    D      Add comment for down/upgrading client 2018-11-10
#5 H3    C      down                                  2018-10-10
#6 H3    C+     down                                  2018-10-02
#7 H4    C      Add comment for down/upgrading client 2020-09-03

推荐阅读