首页 > 解决方案 > 以内存有效的方式处理没有循环的 R 数据帧行

问题描述

我的 dataframe 的结构data1有超过 150 万行,如下所示:

data1 <- data.frame(NEW_UPC=c
                IRI_KEY=c(1073521,1073521,1073521,1073525,1073525,1073525,1078106,1078106,1078106,1078107,1078107,1078107,1073521,1073521,1073521,1073525,1073525,1073525,1078106,1078106,1078106,1073521,1073521,1073521,1073525,1073525,1073525,1078106,1078106,1078106,1073521,1073521,1073525,1073525,1078106,1078106,1073521,1073521,1073521,1073525,1073525,1073525,1078106,1078106,1078106),
                WEEK = c(1229,1230,1232,1218,1224,1229,1282,1285,1287,1229,1230,1232,1229,1230,1232,1218,1224,1229,1282,1285,1287,1229,1230,1232,1217,1221,1227,1270,1272,1273,1273,1274,1270,1272,1217,1221,1229,1230,1232,1218,1224,1229,1282,1285,1287),
                END=c(1232,1232,1232,1229,1229,1229,1287,1287,1287,1232,1232,1232,1232,1232,1232,1229,1229,1229,1287,1287,1287,1232,1232,1232,1227,1227,1227,1273,1273,1273,1274,1274,1272,1272,1221,1221,1232,1232,1232,1229,1229,1229,1287,1287,1287))

我需要Exit.time使用列中的值WEEKEND截止值(即 1287)插入一列。Exit.time根据以下逻辑,该值应为 0 或 1:

如果WEEK= 1287,则Exit.time= 0。

如果Week不等于 1287,但WEEK=END那么Exit.time= 1,否则Exit.time= 0。

为此,我尝试了以下 for 循环,它完成了上述虚拟数据集中所需的操作。

i=0
for(i in 1:length(data2$NEW_UPC)){
  if (data2$WEEK[i]==1287) {
    data2$Exit.time[i] <- 0
  } else if(data2$WEEK[i]==data2$END[i]) {
    data2$Exit.time[i] <- 1
  } else {
    data2$Exit.time[i] <- 0
  }
}

问题是当我在我的真实数据集中使用上述循环时,即使一个小时后我也没有得到输出。考虑到数据集的大小,我猜循环效率不高。有没有其他方法可以做我想做的事?我更喜欢保持行的顺序,data1因为我稍后需要做一些合并操作。

标签: rfor-loopdataframe

解决方案


由于您需要Exit.time在 1 时为 1 (WEEK == END) & WEEK != 1287,否则为 0 ,您可以as.numeric在 的结果上使用(WEEK == END) & WEEK != 1287,它会TRUE变为1和。FALSE0

data1$Exit.time <- with(data1, as.numeric(WEEK != 1287 & WEEK == END))

推荐阅读