r - Regex and str_remove_all in R - only remove words if multiple conditions are met
问题描述
I am trying to remove all instances of a country name based on the following conditions:
Country name not at beginning of string
Country name does not follow 'of '
So if I take a fictional string: Australia National Australia Bank of Australia
I only want to remove the instance of Australia highlighted in bold
I am using str_remove_all to pass a collapsed string of country names to a vector of company names.
country <- data.frame(name = c("Australia", "Singapore", "Malaysia")) %>%
mutate(name_regex = paste0("((?<!^)\\b", name, "\\b", "|(?<!of\\s)\\b", name, "\\b)"))
country_remove <- str_c(country$name_regex, collapse = "|")
str_remove_all(x, regex(country_remove, ignore_case = T))
(?<!^)\bAustralia\b # select all instances not at beginning
(?<!of\s)\bAustralia\b # select all instances not following 'of '
When I try and combine these together, it ends up just removing everything.
Thanks in advance!
解决方案
您应该像这样构建正则表达式:
country <- data.frame(name = c("Australia", "Singapore", "Malaysia"))
name_regex <- paste0("\\b(?<!of\\s)(?<!^)(?:", paste(country$name, collapse="|"), ")\\b")
s <- "Australia National Australia Bank of Australia"
str_remove_all(s, regex(name_regex, ignore_case=TRUE))
## => [1] "Australia National Bank of Australia"
图案看起来像
\b(?<!of\s)(?<!^)(?:Australia|Singapore|Malaysia)\b
在线查看正则表达式演示。
细节
\b
- 单词边界(?<!of\s)
- noof
+ 当前位置左侧的空格是允许的(?<!^)
- 不允许在当前位置开始字符串位置(?:Australia|Singapore|Malaysia)
- 任何替代品\b
- 单词边界。
推荐阅读
- python - 分类明智的 TensorFlow 对象检测计数
- r - 估计命令如何在 R 的公式中找到变量名?
- python - 如何修复'AttributeError:'NoneType'对象在python中没有属性'get'错误
- excel - 如何从范围创建数组?
- bash - 如何判断哪个节点在 Slurm 中执行时正在执行代码?
- qt - 如何测试赋予 ListView 的 highlight 属性的组件?
- react-native - React Native:当用户单击发送按钮而不刷新时显示消息
- linux - 如果在同一个陷阱处理程序中触发信号会发生什么?
- matplotlib - 是否可以在 matplotlib 中使用 kpfonts?
- android - 为仪器测试注入 espresso 规则的依赖项