首页 > 解决方案 > 如何提取特定字符串及其对应的数值?

问题描述

我的数据框“B”中有以下列“检查”,它在不同的行中有输入语句。这些语句有一个变量 'abc' ,并且对应于它们也有一个值条目。完成的条目是手动的,并且对于每个条目来说并不连贯。我必须只提取“abc”,然后是它的“值”

< B$checks

    rows    Checks
    [1] there was no problem  reported measures abc-96 xyz 450 327bbb11869 xyz 113 aaa 4 poc 470 b 3 surveyor issue
    [2] abc(107 to 109) xyz 115 jbo xyz 104 optim
    [3] problemm with caller abc 95 19468 4g xyz 103 91960 1 Remarks new loc reqd is problem
    [4] abc_107 xyz 116 dor problem 
    [5] surevy done , no approximation issues abc 103 xyz 109 crux xyz 104 
    [6] ping test ok abc(86 rxlevel 84
    [7] field is clean , can be used to buiild the required set up abc-86 xyz 94 Digital DSL  No Building class Residential Building Type Multi
    [8] abc 89 xyz 99 so as the user has no problem , check ping test

预期产出

rows    Variable    Value
        [1] abc 96
        [2] abc 107
        [3] abc 95
        [4] abc 107
        [5] abc 103
        [6] abc 86
        [7] abc 86
        [8] abc 89

我在类似查询下使用引用尝试了以下操作

使用 str_match

library(stringr)
m1 <- str_match(B$checks, "abc.*?([0-200.]{1,})")  # value is between 0 to 200

这产生了一些类似下面的东西

    row var value
1   abc-96 xyz 450  0
2   abc(10  10
3   abc 95 1    1
4   abc_10  10
5   abc 10  10
6   NA  NA
7   NA  NA
8   NA  NA

然后我尝试了以下

B$Checks <- gsub("-", " ", B$Checks)
B$Checks <- gsub("/", " ", B$Checks)
B$Checks <- gsub("_", " ", B$Checks)
B$Checks <- gsub(":", " ", B$Checks)
B$Checks <- gsub(")", " ", B$Checks)
B$Checks <- gsub("((((", " ", B$Checks)
B$Checks <- gsub(".*abc", "abc", B$Checks) 
B$Checks <- gsub("[[:punct:]]", " ", B$Checks)
regexp <- "[[:digit:]]+"   
m <- str_extract(B$Checks, regexp) 
m <- as.data.frame(m)

并且能够得到“预期的输出”,

但现在我正在寻找以下

1)更简单的命令集或提取预期输出的方法

2)获取表示为范围的值,例如我想要下面的输入行

rows    Checks
[2] abc(107 to 109) xyz 115 jbo xyz 104 optim

作为

输出 >

rows    Variable    Value1 Value2
 [2]     abc        107   109

需要 1) 和 2) 的解决方案,因为我正在处理具有相同模式和大量混合变量值组合的更大数据集。

提前致谢。

标签: rregexstringr

解决方案


您需要捕获数字,并在数字之前指定您想要abc的后视:

Value <- sub(".*(?<=abc)(\\D+)?(\\d*)\\D?.*", "\\2", str, perl=TRUE)
# Value
#[1] "96"  "107" "95"  "107" "103" "86"  "86"  "89"

然后,您可以将值放在 a 中data.frame

B <- data.frame(Variable="abc", Value=as.numeric(Value))
head(B, 3)
#  Variable Value
#1      abc    96
#2      abc   107
#3      abc    95

数据

str <- c("there was no problem  reported measures abc-96 xyz 450 327bbb11869 xyz 113 aaa 4 poc 470 b 3 surveyor issue", 
"abc(107 to 109) xyz 115 jio xyz 104 optim", "problemm with caller abc 95 19468 4g xyz 103 91960 1 Remarks new loc reqd is problem", 
"abc_107 xyz 116 dor problem", "surevy done , no approximation issues abc 103 xyz 109 crux xyz 104 ", 
"ping test ok abc(86 rxlevel 84", "field is clean , can be used to buiild the required set up abc-86 xyz 94 Digital DSL  No Building class Residential Building Type Multi", 
"abc 89 xyz 99 so as the user has no problem , check ping test")

推荐阅读