首页 > 解决方案 > 通过多个条件从因子列创建新列

问题描述

我想从包含多个因素的现有列创建一个新列,但其中部分因素名称再次出现。让我举例说明:

factorA <- c("paul173643738","paul827484","george39585496","george7848658946","john2354674","john346","ringo384934","ringo24653")
df <- data.frame(factorA)

这是我的尝试:

library(dplyr)
    df <- mutate(
           df,factorB = case_when(
           matches(factorA,"paul.") ~ "paul",
           matches(factorA,"george.") ~ "george",
           matches(factorA,"john.") ~ "john",
           matches(factorA,"ringo.") ~ "ringo",
           TRUE ~ "NA"))

这给了我Error in mutate_impl(.data, dots) : Evaluation error: is_string(match) is not TRUE.我假设这是我没有正确指定R应该如何查找我想要的字符串片段的结果。

结果应如下所示:

           factorA  factorB
1    paul173643738  paul
2       paul827484  paul 
3   george39585496  george
4 george7848658946  george
5      john2354674  john
6          john346  john
7      ringo384934  ringo
8       ringo24653  ringo

我确定这个问题之前已经被问过,但我找不到任何适合我需要的答案。任何帮助将不胜感激。

标签: rregexdplyrgsubgrepl

解决方案


使用stringr

library(stringr)
df %>%
mutate(factorB = case_when(
str_detect(factorA, 'paul') ~ 'paul',
str_detect(factorA,"paul.") ~ "paul",
str_detect(factorA,"george.") ~ "george",
str_detect(factorA,"john.") ~ "john",
str_detect(factorA,"ringo.") ~ "ringo"
))

推荐阅读