首页 > 解决方案 > 在R中以顺序方式从具有最大值的列中减去

问题描述

我仍在学习如何在 R 中执行循环和 if-else 语句。我可以用长手方法完成该过程,但我将在大型数据集中实现它们,因此我需要在循环/if-else 中处理它们。

我的数据看起来有点像下面的示例数据框。其中一列包含行内最大值的列号:

     x1   x2   x3   x4   x5   x6   x7 max_index max_val
1  56.1 56.8 99.4 44.6 50.4 74.9 17.7         3    99.4
2   9.1 46.1 74.2 64.3 62.3 68.8 85.7         7    85.7
3  83.3 84.5 18.4 93.2 17.6 69.7 23.4         4    93.2
4  94.0  9.7 46.8 25.0 96.9 69.2 94.8         5    96.9
5  21.5 64.1 89.1 87.7 59.7 88.0 73.5         3    89.1
6  53.0 94.9 87.2 19.6 55.9 48.5 82.9         2    94.9
7  52.2 79.1 20.6  9.9 18.3 21.5 92.5         7    92.5
8  42.5 33.0 36.9 45.0 43.9  7.6 45.3         7    45.3
9  89.3 20.6 41.7 74.8 67.4 21.0 49.1         1    89.3
10 21.2 92.6 86.3 76.3 68.6 44.8  8.8         2    92.6

我想要做的是从彼此减去 3 个连续的列(从最大值),如下所示:

j1 <- max.col(df[,1:7], "first")
df$max_index <- j1
df$max_val <- df[cbind(1:nrow(df), j1)]

i1 <- j1 + 1
i2 <- i1 + 1
i3 <- i2 +1

value <- df[cbind(1:nrow(df), j1)]
value1 <- df[cbind(1:nrow(df), i1)]
value2 <- df[cbind(1:nrow(df), i2)]
value3 <- df[cbind(1:nrow(df), i3)]

df$max_val <- value
df$max.up1 <- value1
df$max.up2 <- value2
df$max.up3 <- value3

df_x1 <- df$max_val - df$max.up1
df_x2 <- df$max.up1 - df$max.up2
df_x3 <- df$max.up2 - df$max.up3

之后,我想知道所有 3 个输出(df_x1、df_x2、df_x3)是否都是正数,如果不是,则添加一个显示“TRUE”和“FALSE”的列。

我希望我的最终数据框看起来像这样:

     x1   x2   x3   x4   x5   x6   x7 max_index max_val t.or.f
1  56.1 56.8 99.4 44.6 50.4 74.9 17.7         3    99.4   FALSE
2   9.1 46.1 74.2 64.3 62.3 68.8 85.7         7    85.7   NA
3  83.3 84.5 18.4 93.2 17.6 69.7 23.4         4    93.2   FALSE
4  94.0  9.7 46.8 25.0 96.9 69.2 94.8         5    96.9   NA
5  21.5 64.1 89.1 87.7 59.7 88.0 73.5         3    89.1   FALSE
6  53.0 94.9 87.2 19.6 55.9 48.5 82.9         2    94.9   FALSE
7  52.2 79.1 20.6  9.9 18.3 21.5 92.5         7    92.5   FALSE
8  42.5 33.0 36.9 45.0 43.9  7.6 45.3         7    45.3   FALSE
9  89.3 20.6 41.7 74.8 67.4 21.0 49.1         1    89.3   FALSE
10 21.2 92.6 86.3 76.3 68.6 44.8  8.8         2    92.6   TRUE

我将如何简化我的代码?谢谢!

标签: rloopsif-statementsequence

解决方案


我这里是data.table结构化数据方法的解决方案:

library(data.table)

dt.m <- read.table(text = "
x1   x2   x3   x4   x5   x6   x7 max_index max_val
1  56.1 56.8 99.4 44.6 50.4 74.9 17.7         3    99.4
2   9.1 46.1 74.2 64.3 62.3 68.8 85.7         7    85.7
3  83.3 84.5 18.4 93.2 17.6 69.7 23.4         4    93.2
4  94.0  9.7 46.8 25.0 96.9 69.2 94.8         5    96.9
5  21.5 64.1 89.1 87.7 59.7 88.0 73.5         3    89.1
6  53.0 94.9 87.2 19.6 55.9 48.5 82.9         2    94.9
7  52.2 79.1 20.6  9.9 18.3 21.5 92.5         7    92.5
8  42.5 33.0 36.9 45.0 43.9  7.6 45.3         7    45.3
9  89.3 20.6 41.7 74.8 67.4 21.0 49.1         1    89.3
10 21.2 92.6 86.3 76.3 68.6 44.8  8.8         2    92.6", header = TRUE)

dt.m <- data.table(dt.m)
dt.m[, row.id := 1:.N]

# melt data to make it easy to work with, excluding max.val and max.index
dt <- melt(data = dt.m, measure.vars = 1:7, id.vars = "row.id")

# replicate max.val and max.index which are already provided in example
dt[, max.val := max(value), by = row.id]
dt[, max.index := which(value == max.val), by = row.id]

dt[, x.index := 1:.N, by = row.id]

# filter to values after the max value
out <- dt[x.index >= max.index]
# keep max value and 3 values post max value 
out <- out[, post.max.index := 1:.N, by = row.id][post.max.index <= 4]
out <- out[order(row.id, x.index)]
out[, previous.x := shift(value)]
out[, change.x :=  previous.x - value]
out <- out[max.index != x.index]

# check if all values are positive
res <- out[, .(all.next.positive = all(change.x > 0)), by = row.id]
# add result to the original data
dt.m <- merge(dt.m, res, by = "row.id", all.x = TRUE)

推荐阅读