首页 > 解决方案 > 在循环/lapply/mutate 中使用其他列中的条件和循环列的数字索引

问题描述

我有一个这样的数据框:

> df
   V1 V2 V3 V4 V5 V6
 1  1  1  2 NA  1  0
 2  0  0  2  1 NA  1
 3  1  0  2  1  1 NA
 4  0  1  2  0  0 NA
 5  1  0  2  1  1 NA
 6  0  0  2 NA  1  1
 7  0  1  2 NA  1 NA
 8  0  0  2 NA  1 NA
 9  1  0  2  1  1  1
10  0  1  2  1  1 NA

dput 如下(编辑:更正):

df <- structure(list(V1 = c(1L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 0L), 
                     V2 = c(1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 1L), V3 = c(2L, 
                                                                            2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), V4 = c(NA, 1L, 1L, 0L, 
                                                                                                                        1L, NA, NA, NA, 1L, 1L), V5 = c(1L, NA, 1L, 0L, 1L, 1L, 1L, 
                                                                                                                                                        1L, 1L, 1L), V6 = c(0L, 1L, NA, NA, NA, 1L, NA, NA, 1L, NA
                                                                                                                                                        )), row.names = c(NA, -10L), class = "data.frame")

我正在寻找可以V1:V3保持不变的代码。因为V4:V6我想应用类似以下的if_else语句:

if_else(df$V1 == 0 & df$V2 == 1 & "index of loop columns" > df$V3, 1, "do nothing")

例如,对于第 4/7/10$V6 行,NA将更改为1,因为以下语句为真:

if_else(df$V1 == 0 & df$V2 == 1 & numerical index [3] > df$V3 [2], 1, df$V6

其余的行应该保持不变,应该V4V5,因为索引是1并且2因此永远不会大于V3

我用 for 循环和 lapply 产生了一些死胡同,因为我不知道如何将>运算符的特定数字索引放入我的代码中。我将不胜感激任何建议!谢谢!

标签: rmissing-data

解决方案


我认为这行得通。有点难以判断,因为与dput()您问题中的打印数据不匹配......

df <- structure(list(V1 = c(1L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 0L), 
  V2 = c(1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 1L), V3 = c(2L, 
  2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), V4 = c(NA, 1L, 1L, NA, 
  1L, NA, NA, NA, 1L, 1L), V5 = c(1L, NA, 1L, 1L, NA, 1L, 1L, 
  1L, 1L, 1L), V6 = c(NA, 1L, NA, NA, NA, 1L, NA, NA, 1L, NA
  )), class = "data.frame", row.names = c(NA, -10L))

df
#    V1 V2 V3 V4 V5 V6
# 1   1  1  2 NA  1 NA
# 2   0  0  2  1 NA  1
# 3   1  0  2  1  1 NA
# 4   0  1  2 NA  1 NA
# 5   1  0  2  1 NA NA
# 6   0  0  2 NA  1  1
# 7   0  1  2 NA  1 NA
# 8   0  0  2 NA  1 NA
# 9   1  0  2  1  1  1
# 10  0  1  2  1  1 NA

library(dplyr)
cols_to_loop = c("V4", "V5", "V6")

for (i in seq_along(cols_to_loop)) {
  df = mutate(df, across(cols_to_loop[i], ~if_else(V1 == 0 & V2 == 1 & i > V3, 1L, .) ))
}

df
#    V1 V2 V3 V4 V5 V6
# 1   1  1  2 NA  1 NA
# 2   0  0  2  1 NA  1
# 3   1  0  2  1  1 NA
# 4   0  1  2 NA  1  1
# 5   1  0  2  1 NA NA
# 6   0  0  2 NA  1  1
# 7   0  1  2 NA  1  1
# 8   0  0  2 NA  1 NA
# 9   1  0  2  1  1  1
# 10  0  1  2  1  1  1

推荐阅读