首页 > 解决方案 > R在已知字符串之间组合字符串

问题描述

我有一长串具有特定结构的字符串向量。我想组合字符串并揭示这种结构。一个例子将清除这一点。

chr_vec <- c("Random Title", "Start", "dsf", "sdvf", "Stop", "Start", "dsf", "sdvf", "Stop", "Start", "dsf", "sdvf", "Stop", "Another Random Title", "Start", "erg", "vdf", "vfd", "efw", "Stop",
             "Start", "erg", "vdf", "vfd", "efw", "Stop", "Start", "erg", "vdf", "vfd", "efw", "Stop")

所以我有随机标题,但是开始 - 停止之间的单词(包含的应该组合在一起。应该包含随机标题,所以我知道属于哪个块结构。结果会是这样的:

result <- list("Random Title" = list(c("Start", "dsf", "sdvf", "Stop"), c("Start", "dsf", "sdvf", "Stop")),
+                "Another Random Title" = list(c("Start", "erg", "vdf", "vfd", "efw", "Stop"), c("Start", "erg", "vdf", "vfd", "efw", "Stop"), c("Start", "erg", "vdf", "vfd", "efw", "Stop")))
> result
$`Random Title`
$`Random Title`[[1]]
[1] "Start" "dsf"   "sdvf"  "Stop" 

$`Random Title`[[2]]
[1] "Start" "dsf"   "sdvf"  "Stop" 


$`Another Random Title`
$`Another Random Title`[[1]]
[1] "Start" "erg"   "vdf"   "vfd"   "efw"   "Stop" 

$`Another Random Title`[[2]]
[1] "Start" "erg"   "vdf"   "vfd"   "efw"   "Stop" 

$`Another Random Title`[[3]]
[1] "Start" "erg"   "vdf"   "vfd"   "efw"   "Stop" 

我不确定 START-STOP 之间有多少个字符串。标题是随机的。我的数据格式不需要是矢量。我通过 tibble 和 cumsum 尝试了这个,但是失败了,因为我需要那些标题。

我的努力:

res <- tibble(text = chr_vec) %>% 
  mutate(group = cumsum(text == "Start"))

这几乎可行,但那些标题正在搞乱这种方法。他们将被错误地识别。

标签: rlist

解决方案


基础 R 中的解决方案

t1=grep("Start",chr_vec)
t2=grep("Stop",chr_vec)
sek=mapply(seq,t1,t2)

j=1
lst=list()
for (i in 1:length(sek)) {
  
  if (i==1) {
    tit=chr_vec[1]
  } else {
    if ((head(sek[[i]],1)-tail(sek[[i-1]],1))!=1) {
      tit=chr_vec[head(sek[[i]],1)-1]
      j=1
    }
  }
  
  lst[[tit]][[j]]=chr_vec[sek[[i]]]
  j=j+1
}

导致

$`Random Title`
$`Random Title`[[1]]
[1] "Start" "dsf"   "sdvf"  "Stop" 

$`Random Title`[[2]]
[1] "Start" "dsf"   "sdvf"  "Stop" 

$`Random Title`[[3]]
[1] "Start" "dsf"   "sdvf"  "Stop" 


$`Another Random Title`
$`Another Random Title`[[1]]
[1] "Start" "erg"   "vdf"   "vfd"   "efw"   "Stop" 

$`Another Random Title`[[2]]
[1] "Start" "erg"   "vdf"   "vfd"   "efw"   "Stop" 

$`Another Random Title`[[3]]
[1] "Start" "erg"   "vdf"   "vfd"   "efw"   "Stop"

推荐阅读