首页 > 解决方案 > 如何从数据框中提取特定行

问题描述

假设我有足够大的数据框,大约有一百万行

我想删除数据框中 BSM 和 ENDBSM 之间的行,我怎样才能有效地做到这一点?

我想首先用 1 标记行,我需要使用以下循环提取这些行,但它需要永远。

chkSTR = 0
for(i in 1:nrow(rDATA)){

  if(rDATA$Data[i] == "BSM"){
    chkSTR = 1
  }

  if(rDATA$Data[i] == "ENDBSM"){
    chkSTR = 0
  }

  rDATA$BOOL[i] = chkSTR

}

输入数据框示例

rData = data.frame(

Data = 

c(1,"BSM","a",3,3,"ENDBSM",1,3,1,"BSM","b",3,3,"ENDBSM",1,2,1,"BSM","c",2,3,"ENDBSM",1,2)

)


Output example

rData = data.frame(

Data = 

c("BSM","a",3,3,"ENDBSM","BSM","b",3,3,"ENDBSM","BSM","c",2,3,"ENDBSM")

)

标签: r

解决方案


正如评论中提到的,数量"BSM""ENDBSM"相同的,并且"BSM"总是首先出现,我们可以使用mapply并在索引到子集之间创建一个序列。

rData[c(mapply(`:`, which(rData$Data == "BSM"), 
                    which(rData$Data == "ENDBSM"))), , drop = FALSE]
#    Data
#2     BSM
#3       a
#4       3
#5       3
#6  ENDBSM
#10    BSM
#11      b
#12      3
#13      3
#14 ENDBSM
#18    BSM
#19      c
#20      2
#21      3
#22 ENDBSM

推荐阅读