首页 > 解决方案 > 如何根据条件消除匹配模式

问题描述

我有一个包含 3 列的数据框,如下所示
I/p 数据框

A  =  c("(0_22),(0_25),(1_29)","(1_34),(1_38),(0_40)","(0_07),(0_09),(0_10),(0_13)","(1_47),(1_49),(1_53),(1_57)")
    zero =c(5,NA,6,NA)
    one = c(NA,4,NA,10)
    df = data.frame(A,zero,one)

O/p 数据帧

A  =  c("(0_22),(0_25),(1_29)","(1_34),(1_38),(0_40)","(0_07),(0_09),(0_10),(0_13)","(1_47),(1_49),(1_53),(1_57)")
zero =c(5,NA,6,NA)
one = c(NA,4,NA,10)
required_val = c("(1_29)","(0_40)",'','')
df = data.frame(A,zero,one,required_val)

如何根据零和一变量从变量“A”中获取列“required_val”

即如果 var "zero" 大于 0 则消除由 (0_) 组成的字符串
如果 var "one" 大于 0 则消除由 (1_) 组成的字符串

标签: rregex

解决方案


这基本上是一个模式匹配问题:

library(magrittr) # to avoid repeating the long subscript below
df$A <- as.character(df$A) # think this is what you wanted

# get rid of the (0_...) bits:
df$A[! is.na(df$zero) & df$zero > 0] %<>% 
      {gsub("?\\(0_.*?\\)", "", .)} 

# and the (1_...) bits:
df$A[! is.na(df$one) & df$one > 0]   %<>% 
      {gsub("?\\(1_.*?\\)", "", .)}

# now get rid of trailing commas (this was trickiest!)
df$A %<>%
      {gsub(",+$", "", .)} %>%
      {gsub("^,+", "", .)} %>%
      {gsub(",+", ",", .)}

推荐阅读