首页 > 解决方案 > 扩展 `str_to...` 以包含缩写

问题描述

出于创建具有特定情况的列标题的愿望,我想排除缩写。这snakecase::anycase()是一个很好的功能,但如果有一个特殊字符(例如#),它会被删除。

说一个字符串(或者具体来说,一个表有这些列)

library(stringr)
x <- c("nyc", "buffalo", "la", "raleigh", "richmond")
str_to_title(x)
#> [1] "Nyc"      "Buffalo"  "La"       "Raleigh"  "Richmond"

但是“NYC”和“LA”是缩写,我希望这些都是大写的。IE

"NYC"      "Buffalo"  "LA"       "Raleigh"  "Richmond"

当然,这个问题不仅仅是改变表头的大小写。string_to...在其他情况下,包含缩写词可能很有用。

Created on 2021-10-05 by the reprex package (v2.0.1)

标签: rstringr

解决方案


string_r是一个带有字符串的通用包,它可以很容易地扩展。

  
  #' Converts case of a string to a class, but keeps the abbreviations as listed.
  #'
  #' @param string A string (for example names of a data frame).
  #' @param str_to a function that emphasizes the case from stringr: str_to_title, str_to_sentence, str_to_upper, str_to_lower. Need to select one.
  #' @param abbreviations character vector with (uppercase) abbreviations.
  #'
  #' @return A character vector.
  #'
  #' @author Malte Grosser, \email{malte.grosser@@gmail.com}
  #' @keywords utilities
#'
str_to_with_abbreviations <-
  function(string,
           str_to = c(str_to_title, str_to_sentence, str_to_upper, str_to_lower),
           abbreviations = NULL
  ){
    if(is.null(abbreviations)){
      warning("Abbreviations is NULL. why are you doing a more complicated function? Just using glue::glue_data{str_to}.")
    }
    
    #To replace, convert both abbreviations and string to `str_to`
    abbrevations_to_x <- abbreviations %>% str_to()
    string_to_x <- string %>% str_to()
    
    #Are abbreviations in the string? If not, just return!
    abbreviations_in_string <- any(string_to_x %in% abbrevations_to_x)
    
    if(abbreviations_in_string & !is.null(abbreviations)){
      #Anything better than a for loop?
      for(j in 1:length(abbreviations)){
        string_to_x <- stringr::str_replace_all(
          string_to_x,
          abbrevations_to_x[j],
          abbreviations[j]
        )
      }
    }
    
    return(string_to_x)
  }

此函数可以应用于指定的字符串以获得所需的输出:

str_to_with_abbreviations(
c("nyc", "buffalo", "la", "raleigh", "richmond"),
abbreviations = c("NYC", "LA"),
str_to = str_to_title
)
#> [1] "NYC"      "Buffalo"  "LA"       "Raleigh"  "Richmond"

该功能并不完美,但它是大多数应用程序的开始。


推荐阅读