首页 > 解决方案 > 如何返回一个 DataFrame 中与另一个 DataFrame 中的行部分匹配的行(字符串匹配)

问题描述

我想返回 list2 中包含 list1 中的字符串的所有行。

list1 <- tibble(name = c("the setosa is pretty", "the versicolor is the best", "the mazda is not a flower"))

list2 <- tibble(name = c("the setosa is pretty and the best flower", "the versicolor is the best and a red flower", "the mazda is a great car"))

例如,代码应该从 list2 中返回“the setosa is pretty and the best 花”,因为它包含来自 list1 的短语“the setosa is pretty”。我努力了:

grepl(list1$name, list2$name)

但我收到以下警告: “警告消息:在 grepl(commonPhrasesNPSLessthan6$value, dfNPSLessthan6$nps_comment) 中:参数 'pattern' 的长度 > 1,并且只会使用第一个元素”。

我会很感激一些帮助!谢谢!

编辑

list1 <- structure(list(value = c("it would not let me", "to go back and change", 
"i was not able to", "there is no way to", "to pay for a credit"
), n = c(15L, 14L, 12L, 11L, 9L)), row.names = c(NA, -5L), class = c("tbl_df", 
"tbl", "data.frame"))

list2 <- structure(list(comment = c("it would not let me go back and change things", 
"There is no way to back up without starting allover.", "Could not link blah blah account. ", 
"i really just want to speak to someone - and, now that I'm at the very end of the process-", 
"i felt that some of the information that was asked to provide wasn't necessary", 
"i was not able to to go back and make changes")), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame")

)

标签: rpattern-matchingstringr

解决方案


编辑基于新数据:

list2 %>% 
  filter(stringr::str_detect(comment,paste0(list1$value,collapse = "|")))
# A tibble: 2 x 1
  comment                                      
  <chr>                                        
1 it would not let me go back and change things
2 i was not able to to go back and make changes

原来的

一个stringr选项:

list2[stringr::str_detect(list2$name,list1$name),]
# A tibble: 2 x 1
  name                                       
  <chr>                                      
1 the setosa is pretty and the best flower   
2 the versicolor is the best and a red flower

唯一的base解决方案:

list2[lengths(lapply(list1$name,grep,list2$name))>0,]
# A tibble: 2 x 1
  name                                       
  <chr>                                      
1 the setosa is pretty and the best flower   
2 the versicolor is the best and a red flower

推荐阅读