首页 > 解决方案 > R应用和过滤很慢,替代品?

问题描述

我在 apply 中编写了一个函数来处理数据帧。我正在尝试查看“第一”和“第二”列中的每个值组合,并确保我还没有这样的行(“第二”,“第一”)。但是,我为此编写的函数需要很长时间才能运行。我认为问题在于我正在使用 dplyr 的“过滤器”创建数据的子集。我对这项工作相当陌生,不确定如何加快速度。任何建议都会很棒!

我在具有 6GB RAM 和 2.20GHz CPU 的个人计算机上运行它。数据框是 7 个变量的 6939380 obs。这是我的数据的一个子集。

first <- c("Q1","Q2","Q3","Q4","Q5")
second <- c("Q6","Q7","Q8","Q9","Q10")
third_Q <- c("Q11","Q12","Q13","Q14","Q15")
third_filter <- c("yes","yes","no","yes","maybe")
combo1 <- c("Q1_-_Q6","Q2_-_Q7","Q3_-_Q8","Q4_-_Q9","Q5_-_Q10")
combo2 <- c("Q6_-_Q1","Q7_-_Q2","Q8_-_Q3","Q9_-_Q4","Q10_-_Q5")
row <- c(1,2,3,4,5)


temp2 <- data.frame(first,second,third_Q,third_filter,combo1,combo2,row)

这是我编写并尝试在此数据帧上运行的函数。


    fun3 <- function(x){
      #print the row # you are working on
      print(paste("row",as.numeric(x['row'],sep="_")))

      #get the row number you are looking for, the 3rd question, the 3rd filter, and the combo of questions you are asking
      rownum <- as.numeric(x['row'])
      q3 <- as.character(x['third_Q'])
      q3_filter <- as.character(x['third_filter'])
      combo <- as.character(x['combo1'])

      #subset the data to only look at rows that are above the row number you are analyzing, and the 3rd question & filter you are asking
      su <- (temp2 %>% filter(row<rownum & third_Q==q3 & third_filter==q3_filter))$combo2

      #compare the combo of the row you are working on to the 'combo2' for all of the rows in your subset
      result <- ifelse(combo %in% su,"discard","keep")

      return(result)
    }

    temp2$keep2 <- apply(temp2,1,fun3)

我正在使用函数的“打印.....”部分来观察我的函数运行的速度。输出相当慢(它会每隔约 2 秒打印出一个新的行号)。鉴于我有 6,939,380 行,我想加快速度。

标签: r

解决方案


推荐阅读