首页 > 解决方案 > 当某些值为 NA 时,使用 dplyr 连接字符串字段

问题描述

我有一个包含评论字段的数据数据框。在某些数据行中,有单行没有注释(注释字段中的 NA)。数据中的某些位置有不止一行可能包含或不包含注释。

数据类似于这种结构(尽管有更多字段):

input <- data.frame(
  stringsAsFactors = FALSE,
          Location = c(1L, 1L, 1L, 2L, 2L, 3L, 4L),
           Comment = c("This is a comment", NA, "This is another comment", "This is a comment", NA, "This is a comment", NA)
)
Location  Comment
1         This is a comment
1         NA
1         This is another comment
2         This is a comment
2         NA
3         This is a comment
4         NA

我可以使用 group 连接它并总结如下:

output <- input %>%
  group_by(Location) %>%
  summarise(Comment = paste(Comment, collapse = " | "))

但这会将 NA 值转换为字符串。

Location  Comment
1         "This is a comment | NA | This is another comment"
2         "This is a comment | NA"
3         "This is a comment"
4         "NA"

但是我真正想要的流程输出会从最终评论中排除 NA,除非某个位置的唯一评论是 NA

outputDesired <- data.frame(
  stringsAsFactors = FALSE,
          Location = c(1L, 2L, 3L, 4L),
          Comment = c("This is a comment | This is another comment", "This is a comment", "This is a comment", NA)
)
Location  Comment
1         This is a comment | This is another comment
2         This is a comment
3         This is a comment
4         NA

我可以轻松地将位置 4 中的“NA”文本转换为实际的 NA 值,并且我正在考虑删除“| NA”(如果存在),但可以通过一些帮助将其粘贴到case_when类似以下的语句中:

output <- input %>%
  group_by(Location) %>%
  summarise(Comment = paste(Comment, collapse = " | ")) %>%
  mutate(Comment = case_when(
    Comment == "NA" ~ NA,
    Comment ... (contains " | NA") ~ (remove pattern)
  ))

不过,理想情况下,如果我可以首先忽略 NA 注释,但将所有位置保留在最终输出中,那会更好。

请注意,在现实生活中,这是更大的 dplyr 管道的一部分,所以我更喜欢 tidyverse 解决方案,但很高兴探索其他选项。

有任何想法吗?

标签: rdplyrconcatenationna

解决方案


您可以使用na.omit删除NA值,na_if将空值更改为NA.

library(dplyr)

input %>%
  group_by(Location) %>%
  summarise(Comment = na_if(paste0(na.omit(Comment), collapse = '|'), ''))

#  Location Comment                                  
#     <int> <chr>                                    
#1        1 This is a comment|This is another comment
#2        2 This is a comment                        
#3        3 This is a comment                        
#4        4 NA                                  

推荐阅读