首页 > 解决方案 > 带条件的子串 R

问题描述

我有一个数据框-R

first <- c("robbin", "Santa", "beta", "tom" )
Last <-  c("greek", "alpha", "gamma", "angel")
Primaryphone <- c("9988776655","(123)456","(789)6543210","88")
Cellphone <- c("7896000001","1234567890","8877665","7654")
df<-data.frame(first,Last,Primaryphone,Cellphone)

我正在尝试重现以下输出:

通过取姓氏的前两个字母

只考虑 10 位数的电话号码,如果它们不应该有 10 位数,则应忽略

主电话最后 4 位数字,如果这些列中有 10 个字符,则手机最后 4 位数字

Output
gr66550001
al7890
ga3210

标签: r

解决方案


您可以尝试以下方法:

library(dplyr)

df %>%
  #Replace opening, closing round brackets (())
  mutate(across(c(Primaryphone, Cellphone), ~ gsub('[()]', '', .)), 
         #Change value to blank if less than 10 characters
         across(c(Primaryphone, Cellphone), ~ replace(., nchar(.) != 10, '')))  %>%
  #Keep only those rows which have 10 characters in Primaryphone or Cellphone
  filter(nchar(Primaryphone) == 10 | nchar(Cellphone) == 10) %>%
  #Paste the output using substring
  mutate(output = paste0(substring(Last, 1, 2), 
                         substring(Primaryphone, nchar(Primaryphone) - 3), 
                         substring(Cellphone, nchar(Cellphone) - 3)))

#   First  Last Primaryphone  Cellphone     output
#1 robbin greek   9988776655 7896000001 gr66550001
#2  Santa alpha              1234567890     al7890
#3   beta gamma   7896543210                ga3210

推荐阅读