r - 如何在 R 中删除包含任何非字母字符(连字符和撇号除外)的单词
问题描述
我需要删除 R 中中间有非字母字符(连字符和撇号除外)的字符串中的所有单词(或用空格替换它们)。有人可以帮忙吗?谢谢。
例如
str = "he@llo wor*ld i'm using state-of-the-art technologies it's i4u"
预期输出
" i'm using state-of-the-art technologies it's "
我尝试了以下正则表达式。
lines <- c("i'm",
'gas-lighting',
"i'm gas-lighting",
"i-love-you",
"i@u",
"b2b",
"i'm gas-lighting u i@u b2b")
gsub("\\w+[^a-z'-]+\\w+", " ", lines)
[1] "i'm" "gas-lighting" "i' -lighting" "i-love-you" " "
" " "i' - "
问题是单词之间的空格?试图跳过空间。
gsub("\\w+[^a-z\\s'-]+\\w+", " ", lines)**
[1] "i'm" "gas-lighting" "i' -lighting" "i-love-you" " "
" " "i' - "
它不会跳过空格吗?预期以下字符串。
[1] "i'm" "gas-lighting" "i'm gas-lighting" "i-love-you" " "
" " "i'm gas-lighting u "
更新 2:好的,到目前为止效果很好。
> lines <- c("i'm",
+ 'gas-lighting',
+ "i'm gas-lighting",
+ "i-love-you",
+ "i@u",
+ "b2b",
+ "i'm gas-lighting u and you and you i@u b2b",
+ " he@llo wor$ld how*are&you ")
>
> # split a string at spaces then remove the words
> # that contain any non-alphabetic characters (excpet "-", "'")
> # then paste them together (separate them with spaces)
> unlist(lapply(lines, function(line){
+ words <- unlist(strsplit(line, "\\s+"))
+ words <- words[!grepl("[^a-z'-]", words, perl=TRUE)]
+ paste(words, collapse=" ")}))
[1] "i'm" "gas-lighting"
[3] "i'm gas-lighting" "i-love-you"
[5] "" ""
[7] "i'm gas-lighting u and you and you" ""
更新 1:到目前为止,我正在使用以下正则表达式。
> # replace word at the beginning of a string
> lines <- gsub("^\\s*\\w*[^a-z'-]+\\w*", " ", lines); lines
[1] "i'm" "gas-lighting" "i'm gas-lighting" "i-love-you"
[5] " " " " "i'm gas-lighting u i@u "
> # replace word at the end of a string
> lines <- gsub("\\s[a-z]+[^a-z'-]+\\w*$", " ", lines); lines
[1] "i'm" "gas-lighting" "i'm gas-lighting" "i-love-you"
[5] " " " " "i'm gas-lighting u i@u "
> # replace words between spaces
> gsub("\\s\\w*[^a-z'-]+\\w*\\s", " ", lines)
[1] "i'm" "gas-lighting" "i'm gas-lighting" "i-love-you" " "
[6] " " "i'm gas-lighting u "
解决方案
作为带有 grepl 的 Harro Cyranka 的变体
paste0(sapply(break_1, function(x) x[!grepl("[^Aa-zZ|'|-]", x)]), collapse = " ")
推荐阅读
- javascript - A-Frame:使用 set.Attribute 更改“可见”
- postgresql - 在 postgreSQL 中插入嵌套的 json 文件
- google-play-console - 无法在 Google Play 控制台中发布应用的初始版本
- angular - 如何更改owlDateTime 中的默认时间值?
- javascript - 如何用 Sinon JS 测试异步功能?
- html - 如何更改悬停选择选项的背景颜色?
- html - 面包屑项目的边框着色
- batch-file - Windows 批处理循环变量以
- asp.net-web-api - 空请求正文拒绝请求
- html - 如何为多个背景图像设置动画?