首页 > 解决方案 > 优化慢 for 循环

问题描述

嗨,我有以下for循环,我正在努力提高效率,因为它必须在更大的数据集上运行,但我不知道该怎么做或在哪里查看purrrapply

[编辑]此函数必须执行以下操作:

  1. 在单个数据帧中复制 123 次 ( third_arg) df_1,我们称之为Res
  2. 一旦and中的变量出现匹配,将匹配变量( == )a中的for 替换为variable 。Resdf_2a+1RescRes$cdf_2$cdf_2$d
  3. 传播此更改,Res$c直到数据帧结束,依此类推,以便在a2 个数据帧之间出现以下匹配。
    library(tidyverse)
    
    df_1 <- tibble::tribble(
              ~a,  ~b,    ~c,
              1L, "a", "aaa",
              1L, "a", "bbb",
              1L, "a", "ccc",
              1L, "b", "ddd",
              1L, "b", "eee",
              1L, "b", "fff"
              )
    
    df_2 <- tibble::tribble(
               ~a,  ~b,    ~c,    ~d,
              23L, "a", "aaa", "jjj",
              56L, "b", "ddd", "kkk",
              79L, "b", "fff", "mmm"
              )
    
    third_arg <- 123
    
    
    my_function <- function(df_1, df_2, third_arg){
      temp1 = df_2$a
      Res = df_1
      temp2 = c()
      for (i in seq(2,third_arg)){
        temp = cbind(a = rep(i, 6), df_1[, -1])
        if ((i-1) %in% temp1 == TRUE){
          sub = df_2[df_2[,1] == (i-1),]
          for (j in sub$c){
            temp2 = c(temp2, j)
          }
        }
        if (length(temp2) > 0){
          for (k in temp2){
            temp[temp[, 3] == k, 3] = df_2[df_2[, 3] == k, 4]
          }
        }
        Res = rbind(Res, temp)
      }
      Res
    }
    
    
    my_function(df_1, df_2, third_arg)

[编辑 2]经过一些研究,我取得了进展,我现在需要找到一种方法将它放在一个可以迭代地为任何 , here 工作的nrow函数df_21 + 1 + 1

df <- df_1 %>% slice(rep(row_number(), 123)) %>%
  mutate(a = rep(1:123, each = nrow(df_1)))

final_list <- c()

final_list[[1]] <- df %>%
  mutate(c = if_else(a > df_2$a[1] & 
                        c == df_2$c[1], df_2$d[1], c))

final_list[[1 + 1]] <- final_list[[1]] %>%
  mutate(c = if_else(a > df_2$a[1 + 1] & 
                       c == df_2$c[1 + 1], df_2$d[1 + 1], c))

final_list[[1 + 1 + 1]] <- final_list[[1 + 1]] %>%
  mutate(c = if_else(a > df_2$a[1 + 1 + 1] & 
                       c == df_2$c[1 + 1 + 1], df_2$d[1 + 1 + 1], c))

final_list[[nrow(df_2)]]

标签: rtidyverse

解决方案


一个可行的解决方案,大约快 3 倍。不幸的是,无法完全摆脱for循环......如果有人看得更好。

my_new_function <- function(df_1, df_2, third_arg){
  
  df <- df_1 %>% slice(rep(row_number(), third_arg)) %>%
    mutate(a = rep(1:third_arg, each = nrow(df_1)))
  

 for(i in seq(1:nrow(df_2){ 
   
df <- df %>%
    mutate(c = if_else(a > df_2$a[i] & 
                         c == df_2$c[i], df_2$d[i], c))
 }
 
df
 
}

推荐阅读