r - 如何提取特定字符串及其对应的数值?
问题描述
我的数据框“B”中有以下列“检查”,它在不同的行中有输入语句。这些语句有一个变量 'abc' ,并且对应于它们也有一个值条目。完成的条目是手动的,并且对于每个条目来说并不连贯。我必须只提取“abc”,然后是它的“值”
< B$checks
rows Checks
[1] there was no problem reported measures abc-96 xyz 450 327bbb11869 xyz 113 aaa 4 poc 470 b 3 surveyor issue
[2] abc(107 to 109) xyz 115 jbo xyz 104 optim
[3] problemm with caller abc 95 19468 4g xyz 103 91960 1 Remarks new loc reqd is problem
[4] abc_107 xyz 116 dor problem
[5] surevy done , no approximation issues abc 103 xyz 109 crux xyz 104
[6] ping test ok abc(86 rxlevel 84
[7] field is clean , can be used to buiild the required set up abc-86 xyz 94 Digital DSL No Building class Residential Building Type Multi
[8] abc 89 xyz 99 so as the user has no problem , check ping test
预期产出
rows Variable Value
[1] abc 96
[2] abc 107
[3] abc 95
[4] abc 107
[5] abc 103
[6] abc 86
[7] abc 86
[8] abc 89
我在类似查询下使用引用尝试了以下操作
使用 str_match
library(stringr)
m1 <- str_match(B$checks, "abc.*?([0-200.]{1,})") # value is between 0 to 200
这产生了一些类似下面的东西
row var value
1 abc-96 xyz 450 0
2 abc(10 10
3 abc 95 1 1
4 abc_10 10
5 abc 10 10
6 NA NA
7 NA NA
8 NA NA
然后我尝试了以下
B$Checks <- gsub("-", " ", B$Checks)
B$Checks <- gsub("/", " ", B$Checks)
B$Checks <- gsub("_", " ", B$Checks)
B$Checks <- gsub(":", " ", B$Checks)
B$Checks <- gsub(")", " ", B$Checks)
B$Checks <- gsub("((((", " ", B$Checks)
B$Checks <- gsub(".*abc", "abc", B$Checks)
B$Checks <- gsub("[[:punct:]]", " ", B$Checks)
regexp <- "[[:digit:]]+"
m <- str_extract(B$Checks, regexp)
m <- as.data.frame(m)
并且能够得到“预期的输出”,
但现在我正在寻找以下
1)更简单的命令集或提取预期输出的方法
2)获取表示为范围的值,例如我想要下面的输入行
rows Checks
[2] abc(107 to 109) xyz 115 jbo xyz 104 optim
作为
输出 >
rows Variable Value1 Value2
[2] abc 107 109
需要 1) 和 2) 的解决方案,因为我正在处理具有相同模式和大量混合变量值组合的更大数据集。
提前致谢。
解决方案
您需要捕获数字,并在数字之前指定您想要abc
的后视:
Value <- sub(".*(?<=abc)(\\D+)?(\\d*)\\D?.*", "\\2", str, perl=TRUE)
# Value
#[1] "96" "107" "95" "107" "103" "86" "86" "89"
然后,您可以将值放在 a 中data.frame
:
B <- data.frame(Variable="abc", Value=as.numeric(Value))
head(B, 3)
# Variable Value
#1 abc 96
#2 abc 107
#3 abc 95
数据
str <- c("there was no problem reported measures abc-96 xyz 450 327bbb11869 xyz 113 aaa 4 poc 470 b 3 surveyor issue",
"abc(107 to 109) xyz 115 jio xyz 104 optim", "problemm with caller abc 95 19468 4g xyz 103 91960 1 Remarks new loc reqd is problem",
"abc_107 xyz 116 dor problem", "surevy done , no approximation issues abc 103 xyz 109 crux xyz 104 ",
"ping test ok abc(86 rxlevel 84", "field is clean , can be used to buiild the required set up abc-86 xyz 94 Digital DSL No Building class Residential Building Type Multi",
"abc 89 xyz 99 so as the user has no problem , check ping test")
推荐阅读
- sapui5 - SAPUI5 获取特定子账户的所有注册用户
- javascript - 来自 fetch 的承诺没有正确解决
- python - 使用 python 到数据库的自然语言接口
- javascript - 如何使获取动态链接 JavaScript
- email - Google Suite 应用脚本电子邮件发送限制
- angular - 如何使用ionic3中的按钮隐藏/显示功能
- java - 如何在 Java 中更改 3rd 方库的日志级别
- android - React Native:模拟器卡在“从本地主机加载:8081 ..”
- kdb - 每个权利和每个权利之间的差异
- python - pyodbc.ProgrammingError: ('SQL 包含 0 个参数标记,但提供了 3 个参数', 'HY000')