r - 匹配特殊字符前的字符串
问题描述
我正在尝试匹配两列中的字符串并在“:”之前返回不匹配。如果 x2x, y67y,它不应该返回,因为 x 仍然是 x 而 y 仍然是 y。
我不想匹配“:十进制”。如果 x2y 在两列中,则它是一个匹配项(无论特殊字符后的小数点不匹配) 输入:
input <- structure(list(x = structure(c(1L, 2L, 3L, 3L), .Label = c("A",
"B", "C"), class = "factor"), y = structure(c(2L, 3L, 1L, 4L), .Label = c("A",
"B", "C", "D"), class = "factor"), x_val = c("x2x:0.12345,y67h:0.06732,d7j:0.032647",
"x2y:0.26345,y67y:0.28320,d7r:0.043647", "x2y:0.23435,y67y:0.28310,d7r:0.043547",
"x2y:0.23435,y67y:0.28330,d7r:0.043247"), y_val = c("x2y:0.33134,y67y:0.3131,d7r:0.23443",
"x2y:0.34311,y67y:0.14142,d7r:0.31431", "x2x:0.34314,y67h:0.14141,d7j:0.453145",
"x67b:0.31411,g72v:0.3134,b8c:0.89234")), row.names = c(NA, -4L
), class = "data.frame")
输出:
output <- structure(list(x = structure(c(1L, 2L, 3L, 3L), .Label = c("A",
"B", "C"), class = "factor"), y = structure(c(2L, 3L, 1L, 4L), .Label = c("A",
"B", "C", "D"), class = "factor"), x_val = c("x2x:0.12345,y67h:0.06732,d7j:0.032647",
"x2y:0.26345,y67y:0.28320,d7r:0.043647", "x2y:0.23435,y67y:0.28310,d7r:0.043547",
"x2y:0.23435,y67y:0.28330,d7r:0.043247"), y_val = c("x2y:0.33134,y67y:0.3131,d7r:0.23443",
"x2y:0.34311,y67y:0.14142,d7r:0.31431", "x2x:0.34314,y67h:0.14141,d7j:0.453145",
"x67b:0.31411,g72v:0.3134,b8c:0.89234"), diff_x = c("y67h:0.06732,d7j:0.03264",
NA, "x2y:0.23435,d7r:0.043547", "x2y:0.23435,y67y:0.28330,d7r:0.043247"
), diff_y = c("x2y:0.33134,d7r:0.23443", NA, "y67h:0.14141,d7j:0.453145",
"x67b:0.31411,g72v:0.3134,b8c:0.89234")), row.names = c(NA, -4L
), class = "data.frame")
当我只想匹配“:”字符时遇到问题。以下代码取自此问题:https ://stackoverflow.com/a/55285959/5150629 。
library(dplyr)
library(purrr)
I %>% mutate(diff_x = map2_chr(strsplit(x_val, split = ", "),
strsplit(y_val, split = ", "),
~paste(grep('([a-z])(?>\\d+)(?!\\1)', setdiff(.x, .y),
value = TRUE, perl = TRUE),
collapse = ", ")) %>%
replace(. == "", NA),
diff_y = map2_chr(strsplit(x_val, split = ", "),
strsplit(y_val, split = ", "),
~paste(grep('([a-z])(?>\\d+)(?!\\1)', setdiff(.y, .x),
value = TRUE, perl = TRUE),
collapse = ", ")) %>%
replace(. == "", NA))
谁能帮忙?谢谢!
解决方案
我在https://stackoverflow.com/a/55285959/5150629中修改了我的答案以适应这个问题:
library(dplyr)
library(purrr)
df %>%
mutate(
diff_x = map2_chr(
strsplit(x_val, split = ","),
strsplit(y_val, split = ","),
~ {
setdiff(sub(":.+$", "", .x), sub(":.+$", "", .y)) %>%
grep('([a-z])(?>\\d+)(?!\\1)', ., value = TRUE, perl = TRUE) %>%
sapply(grep, .x, value = TRUE) %>%
paste(collapse = ", ") %>%
replace(. == "", NA)
}
),
diff_y = map2_chr(
strsplit(x_val, split = ","),
strsplit(y_val, split = ","),
~ {
setdiff(sub(":.+$", "", .y), sub(":.+$", "", .x)) %>%
grep('([a-z])(?>\\d+)(?!\\1)', ., value = TRUE, perl = TRUE) %>%
sapply(grep, .y, value = TRUE) %>%
paste(collapse = ", ") %>%
replace(. == "", NA)
}
)
)
输出:
x y x_val y_val diff_x
1 A B x2x:0.12345,y67h:0.06732,d7j:0.032647 x2y:0.33134,y67y:0.3131,d7r:0.23443 y67h:0.06732, d7j:0.032647
2 B C x2y:0.26345,y67y:0.28320,d7r:0.043647 x2y:0.34311,y67y:0.14142,d7r:0.31431 <NA>
3 C A x2y:0.23435,y67y:0.28310,d7r:0.043547 x2x:0.34314,y67h:0.14141,d7j:0.453145 x2y:0.23435, d7r:0.043547
4 C D x2y:0.23435,y67y:0.28330,d7r:0.043247 x67b:0.31411,g72v:0.3134,b8c:0.89234 x2y:0.23435, d7r:0.043247
diff_y
1 x2y:0.33134, d7r:0.23443
2 <NA>
3 y67h:0.14141, d7j:0.453145
4 x67b:0.31411, g72v:0.3134, b8c:0.89234
笔记:
由于我们只对比较字符串格式的第一部分感兴趣
x1y:000000
,因此我sub(":.+$", "", .x)
为每个map2_chr
输入参数添加了一个以首先删除该:000000
部分。setdiff
并且以下grep
步骤按预期工作,以返回不匹配并排除格式为 的字符串x1x
。sapply(grep, .x, value = TRUE)
在第一个grep
获取不匹配向量之后,并搜索它们对应的原始字符串(x1y:000000
形式)。paste
将不匹配的向量折叠成一个逗号分隔的列表。
推荐阅读
- node.js - 是否可以在 nuxt serverMiddleware 中使用打字稿?
- javascript - 如何使用反应“链接”标签下载文件?
- node.js - nodejs服务器上的bcrypt非常慢
- symfony - 带有 API 平台的 symfony 5 中的子资源路由
- python - 从字符串中提取正则表达式模式
- python - 从 JSON 嵌套对象中提取特定对象
- curl - wget,curl - 下载的页面内容将 + 符号编码为“+”
- javascript - 如何将事件侦听器附加到 Mapbox GL Geocoder 搜索输入?(Vue 应用程序)
- amazon-web-services - 如何解决 API Gateway 中的状态码 302?
- php - WordPress:您可以在 PDF 上添加摘录吗