首页 > 解决方案 > 如何使用 R 在另一个字符串向量中提取字符串向量的外观?

问题描述

我有一个这样的字符串向量:

strings <- tibble(string = c("apple, orange, plum, tomato",
                             "plum, beat, pear, cactus",
                             "centipede, toothpick, pear, fruit"))

我有一个水果向量:

fruits <- tibble(fruit =c("apple", "orange", "plum", "pear"))

我想要的是一个带有原始stringsdata.frame 的 data.frame/tibble 以及该原始列中包含的所有水果的第二个列表或字符列。像这样的东西。

strings <- tibble(string = c("apple, orange, plum, tomato",
                             "plum, beat, pear, cactus",
                             "centipede, toothpick, pear, fruit"),
                   match = c("apple, orange, plum",
                             "plum, pear",
                             "pear")
                  )

我已经尝试过str_extract(strings, fruits),我得到了一个列表,其中所有内容都是空白的以及警告:

Warning message:
In stri_detect_regex(string, pattern, opts_regex = opts(pattern)):
longer object length is not a multiple of shorter object length

我试过str_extract_all(strings, paste0(fruits, collapse = "|"))了,我得到了,我得到了同样的警告信息。

我已经查看了Find matches of a vector of a strings in another vector of strings,但这似乎没有帮助。

任何帮助将不胜感激。

标签: rregexstringrstringi

解决方案


这是一个使用 purrr 的示例

strings <- tibble(string = c("apple, orange, plum, tomato",
                         "plum, beat, pear, cactus",
                         "centipede, toothpick, pear, fruit"))

fruits <- tibble(fruit =c("apple", "orange", "plum", "pear"))

extract_if_exists <- function(string_to_parse, pattern){
  extraction <- stringi::stri_extract_all_regex(string_to_parse, pattern)
  extraction <- unlist(extraction[!(is.na(extraction))])
  return(extraction)
}

strings %>%
  mutate(matches = map(string, extract_if_exists, fruits$fruit)) %>%
  mutate(matches = map(string, str_c, collapse=", ")) %>%
  unnest

推荐阅读