r - 如果字符串有某个字符，则用某个值填充同一行中的一个空单元格

假设我有以下数据框：

# S/N    a      b
# 1    L1-S2 <blank>
# 2    T1-T3 <blank>   
# 3    T1-L2 <blank>

如何将上述数据框变成这样：

# S/N    a      b
# 1    L1-S2    LS
# 2    T1-T3    T
# 3    T1-L2    TL

我正在考虑写一个循环，其中

对于 a 列中的 x，

If first character in x == L AND 4th character in x == S, 
    fill the corresponding cell in b with LS

等等...

但是，我不确定如何实现它，或者是否有更优雅的方式来做到这一点。

标签： r

我们可以提取大写字母并删除重复的字母

library(stringr)
library(dplyr)
df1 %>%
   mutate(b = str_replace(str_replace(a, "^([A-Z])\\d+-([A-Z])\\d+", 
         "\\1\\2"), "(.)\\1+", "\\1"))

-输出

#  S_N     a  b
#1   1 L1-S2 LS
#2   2 T1-T3  T
#3   3 T1-L2 TL

或者另一种选择是str_extract_all提取大写字母，遍历listwith map，元素pasteunique

library(purrr)
df1 %>%
     mutate(b = str_extract_all(a, "[A-Z]") %>%
             map_chr(~ str_c(unique(.x), collapse="")))

或者对第一个 tidyverse 选项使用相应base R的选项

df1$b <-  sub("(.)\\1+", "\\1", gsub("[0-9-]+", "", df1$a))

或与strsplit

df1$b <- sapply(strsplit(df1$a, "[0-9-]+"),
         function(x) paste(unique(x), collapse=""))

df1 <- structure(list(S_N = 1:3, a = c("L1-S2", "T1-T3", "T1-L2"), 
b = c(NA, 
NA, NA)), class = "data.frame", row.names = c(NA, -3L))