首页 > 解决方案 > 如何删除字符串中包含“.com”但有链接的所有内容?

问题描述

如何在文本中获得预期的输出,如下例所示?

x<-c("Commerce recommend erkanexample.com.tr. This site erkanexample.com. erkandeneme.com is widely. The company name is apple.commerce is coma. spread")
x<-gsub("(.com)\\S+", "",x)
x
[1] "Commerce r erkanexample This site erkanexample erkandeneme.com is widely. The name is apple is"
expected
[1] "Commerce recommend This site. is widely. The company name is apple.commerce is coma. spread"
> 

标签: rstringgsub

解决方案


stringr包提供基本字符串操作的功能:

library(stringr)
library(dplyr)

x %>% 
  str_split(" ") %>% 
  unlist() %>% 
  str_subset("\\.com($|\\.)",negate = TRUE) %>% 
  str_c(collapse = " ")

给出:

"Commerce recommend This site is widely. The company name is apple.commerce is coma. spread" 

编辑后

x %>% 
  str_split(" ") %>% 
  unlist() %>%
  str_subset("\\.com$", negate = TRUE) %>% 
  str_replace(".*\\.com.*\\.$", ".") %>%
  str_c(collapse = " ") %>%
  str_replace_all(" \\.", "\\.")

给出:

"Commerce recommend. This site. is widely. The company name is apple.commerce is coma. spread"

推荐阅读