首页 > 解决方案 > 如何创建数据字典来搜索模式并给出干净的标题

问题描述

我正在尝试清理我的 R 数据框中的一列中的字符串,但遇到了一个问题,即我运行了太多 ifelse 语句(它告诉我最大值为 50)。有什么替代方法?简单地说,我想在一列中检测一个字符串并在另一列中给出一个输出,但是有超过 50 个标题。这种方法的替代方法是什么?

例如:

ifelse(str_detect(titles$program_name, "¿sabías que...?") == TRUE, "¿sabías que...?",
ifelse(str_detect(titles$program_name, "12 corazones|12 hearts|2 hearts") == TRUE, "12 Corazones",
ifelse(str_detect(titles$program_name, " al lado del otro muro") == TRUE, "Al Otro Lado Del Muro",
ifelse(str_detect(titles$program_name, "a toda gloria") == TRUE, "A Toda Gloria", 
ifelse(str_detect(titles$program_name, "acceso vip") == TRUE, "Acceso VIP",
ifelse(str_detect(titles$program_name, "afv") == TRUE, "AFV",
ifelse(str_detect(titles$program_name, "al rojo vivo") == TRUE, "Al Rojo Vivo",
ifelse(str_detect(titles$program_name, "alma awards") == TRUE, "ALMA Awards",
ifelse(str_detect(titles$program_name, "amar es primavera") == TRUE, "Amar es primavera",
ifelse(str_detect(titles$program_name, "anónima|anonima|anonymized") == TRUE, "Anonima",
ifelse(str_detect(titles$program_name, "astrologia") == TRUE, "Astrologia",
ifelse(str_detect(titles$program_name, "billboard") == TRUE, "Premios Billboard",
ifelse(str_detect(titles$program_name, "bajo el mismo cielo") == TRUE, "Bajo El Mismo Cielo",
ifelse(str_detect(titles$program_name, "bella calamidades") == TRUE, "Bella Calamidades",
ifelse(str_detect(titles$program_name, "betty") == TRUE, "Betty en NY",
ifelse(str_detect(titles$program_name, "buscando mi ritmo") == TRUE, "Buscando Mi Ritmo",


### this continues on for more than 50 titles

我该怎么做才能避免错误?

标签: rregexif-statementdplyr

解决方案


只要每个元素PS

P = c("apple", "orange")
S = c("red apples", "green apple", "oranges", "some orange")

foo = function(p, s){
    s2 = s
    for (x in p) {
        s2 = replace(s2, grepl(x, s2), x)
    }
    return(s2)
}

foo(P, S)
#> [1] "apple"  "apple"  "orange" "orange"

或者

mypattern = paste0(".*(", paste(P, collapse = "|"), ").*")
mypattern
#> [1] ".*(apple|orange).*"

gsub(mypattern, "\\1", S)
#> [1] "apple"  "apple"  "orange" "orange"

推荐阅读