首页 > 解决方案 > 有条件地将字符串粘贴在一起

问题描述

我拥有的数据是一个向量,其中的句子被切成碎片。

y <- c("G'day", "world and everybody", "else.", "How's life?", "Hope", "you're", "doing just", "fine.")

我想把这些句子重新组合起来。

预期结果:

y
[1] "G'day world and everybody else."
[2] "How's life?"
[3] "Hope you're doing just fine."

有一个句子的“规则”是它以大写字母开头。在此规则的基础上,到目前为止我尝试过的是(但结果并不令人满意):

unlist(strsplit(paste0(y[which(grepl("^[A-Z]", y))], " ", y[which(grepl("^[a-z]", y))], collapse = ","), ","))
[1] "G'day world and everybody" "How's life? else."         "Hope you're"               "G'day doing just"         
[5] "How's life? fine."

编辑

提出了这个解决方案,它确实给出了预期的结果,但看起来很丑:

y1 <-  c(paste0(y[grepl("^[A-Z].*[^.?]$", y, perl = T)], " ", unlist(strsplit(paste0(y[which(grepl("^[a-z]", y))], collapse = " "), "\\."))), y[grepl("^[A-Z].*[.?]$", y, perl = T)])

y1
[1] "G'day world and everybody else" "Hope  you're doing just fine"   "How's life?"

有什么更好的解决方案?

编辑 2

这也是一个很好的解决方案:

library(stringr)
str_extract_all(paste(y, collapse = " "), "[A-Z][^.?]*(\\.|\\?)")

标签: r

解决方案


我会使用 agsub在每个大写字母之前插入一个新行,然后在新行处拆分:

unlist(strsplit(gsub(" ([A-Z])", "\n\\1", paste(y, collapse = " ")), "\n"))
#> [1] "G'day world and everybody else." "How's life?"                    
#> [3] "Hope you're doing just fine."

reprex 包(v0.3.0)于 2020 年 5 月 28 日创建


推荐阅读