首页 > 解决方案 > 用 apply 替换 for 循环以获得更快的应用程序

问题描述

我有这个数据集,例如。以下

节点 描述 节点1 描述 节点2 描述
A1 AAA1 B1 BBB1 C1 CCC1
A2 AAA2 B2 BBB2 C2 CCC2
A3 AAA3 C3 CCC3
A4 AAA4 B4 BBB4 C4 CCC4

并且期望节点和descr的节点应该是空白的,并被同一行中的前一个节点和descr替换为:

节点 描述 节点1 描述 节点2 描述
A1 AAA1 B1 BBB1 C1 CCC1
A2 AAA2 B2 BBB2 C2 CCC2
A3 AAA3 A3 AAA3 C3 CCC3
A4 AAA4 B4 BBB4 C4 CCC4
for (j in 8:20){
  for (i in 1:nrow(old_data)){
     if(is.na(old_data[i,j]) && !is.na(old_data[i,j+2]) && !is.na(old_data[i,j-2])){
       old_data[i,j] <- old_data[i,j-2]
       old_data[i,j+1] <- old_data[i,j-1]}
  }
}

现在我可以使用下面的 for 循环来做到这一点,但是由于我的数据很大,扫描数据框并修复它需要永远建议。

标签: rfor-loopapplylapply

解决方案


使用 apply 循环不一定比 for 循环快,在某些情况下,速度更慢。但是您可以删除一个 for 循环并矢量化:

df <- data.frame(
    Node = c("A1", "A2", "A3", "A4"), 
    Descr = c("AAA1", "AAA2", "AAA3", "AAA4"), 
    Node1 = c("B1", "B2", NA, "B4"), 
    Descr1 = c("BBB1", "BBB2", NA, "BBB4"), 
    Node2 = c("C1", "C2", "C3", "C4"),
    Descr2 = c("CCC1", "CCC2", "CCC3", "CCC4"),
    Node3 = c(NA, "D2", "D3", "D4"),
    Descr3 = c(NA, "DDD2", "DDD3", "DDD4"),
    Node4 = c(NA, "E2", "E3", "E4"),
    Descr4 = c(NA, "EEE2", "EEE3", "EEE4")
)

for(i in seq(from = 3, to = ncol(df), by = 2)){
    # if the Descr column is not necessarily NA when its complementary Node 
    # column is NA, then you'll need to split this into two if-statements
    if(any(is.na(df[,i]))){
        df[,i][which(is.na(df[,i]))] <- df[,i-2][which(is.na(df[,i]))]
        df[,i+1][which(is.na(df[,i+1]))] <- df[,i-1][which(is.na(df[,i+1]))]
    }
}

df

  Node Descr Node1 Descr1 Node2 Descr2 Node3 Descr3 Node4 Descr4
1   A1  AAA1    B1   BBB1    C1   CCC1    C1   CCC1    C1   CCC1
2   A2  AAA2    B2   BBB2    C2   CCC2    D2   DDD2    E2   EEE2
3   A3  AAA3    A3   AAA3    C3   CCC3    D3   DDD3    E3   EEE3
4   A4  AAA4    B4   BBB4    C4   CCC4    D4   DDD4    E4   EEE4


如果您有很多行,这应该会快得多。


推荐阅读