首页 > 解决方案 > 在数据框中选择一组列以用于 R 中的 for 循环

问题描述

如果我有一个具有这样列名的数据框......

[1] "subject"            "box"               "date"      "Total_Lever_Press"   "Total_CS_Mg_Ent"  
[6] "Total_NCS_Mg_Ent"   "K1"                "K2"                "K3"                "K4"               
[11] "K5"                "K6"                "K7"                "K8"                "K9"               
[16] "K10"               "K11"               "K12"               "K13"               "K14"              
[21] "K15"               "K16"               "K17"               "K18"               "K19"              
[26] "K20"               "K21"               "K22"               "K23"               "K24"              
[31] "K25"               "L1"                "L2"                "L3"                "L4"               
[36] "L5"                "L6"                "L7"                "L8"                "L9"               
[41] "L10"               "L11"               "L12"               "L13"               "L14"              
[46] "L15"               "L16"               "L17"               "L18"               "L19"              
[51] "L20"               "L21"               "L22"               "L23"               "L24"              
[56] "L25"

如何只选择 K1 到 K25 列来运行 for 循环方程?

我试过 grep 来选择一个开始列和结束列,比如......

number_of_trials <- 25
k_array <- grep("K1":number_of_trials, data)

但这显然行不通。

编辑:我希望将试验次数作为最终参数,以便每个用户都可以更改它。所以它可能是 K1 到 number_of_trials 或 L1 到 number_of_trials

Edit2:我没有提到我想做的循环。我想查看所有 K 列(每行),并计算一列大于零的次数。将该数字放在一个新列中,然后除以 number_of_trials。

Edit3:一个例子,第一行可能看起来像这样......

K1 K2 K3 K4 K5 K6 K7 K8 K9 K10 K11 K12 K13 K14 K15
0  3   4  1  3  0  0  0  0  1   5   6   7   8   0

因此等式将是 9/15 = 0.6(因为该行的 9 列不包含零)。我只希望等式适用于选择列之间的数据框中的每一行。

Edit4:循环示例...

cols <- grep('K\\d+', names(data))
number_of_trials <- 25
data$probability <- 0
for (row in 1:nrow(data)) {
  data[row,"probability"] <- count(data[row,cols] > 0)/number_of_trials
}

Edit5:具有预期结果的表格

K1 K2 K3 K4 K5 K6 K7 K8 K9 K10 K11 K12 K13 K14 K15 Probability
0  3   4  1  3  0  0  0  0  1   5   6   7   8   0     0.6
0  2   3  1  3  0  0  0  0  1   0   0   0   0   0     0.33
0  3   2  1  3  0  0  0  0  1   5   6   7   5   0     0.6
0  3   1  1  3  0  0  0  0  1   5   6   7   8   1     0.66
0  0   0  3  2  0  0  0  1  0   0   0   0   0   0     0.2

原表...

K1 K2 K3 K4 K5 K6 K7 K8 K9 K10 K11 K12 K13 K14 K15 
0  3   4  1  3  0  0  0  0  1   5   6   7   8   0     
0  2   3  1  3  0  0  0  0  1   0   0   0   0   0     
0  3   2  1  3  0  0  0  0  1   5   6   7   5   0     
0  3   1  1  3  0  0  0  0  1   5   6   7   8   1     
0  0   0  3  2  0  0  0  1  0   0   0   0   0   0 

所以概率是 9/15=0.6, 5/15=0.33, 9/15=0.6, 10/15=0.66, 3/15=0.2

回答:

data <- type.convert(data, as.is = TRUE)
cols <- grep('K\\d+', names(data))
data$Probability <- rowMeans(data[cols] > 0)

标签: r

解决方案


似乎在base R中你可以做 -

cols <- grep('K\\d+', names(data))
data$Probability <- rowMeans(data[cols] > 0)
data

#  K1 K2 K3 K4 K5 K6 K7 K8 K9 K10 K11 K12 K13 K14 K15 Probability
#1  0  3  4  1  3  0  0  0  0   1   5   6   7   8   0       0.600
#2  0  2  3  1  3  0  0  0  0   1   0   0   0   0   0       0.333
#3  0  3  2  1  3  0  0  0  0   1   5   6   7   5   0       0.600
#4  0  3  1  1  3  0  0  0  0   1   5   6   7   8   1       0.667
#5  0  0  0  3  2  0  0  0  1   0   0   0   0   0   0       0.200

这将返回每行大于 0 的值的比例。

数据

data <- structure(list(K1 = c(0L, 0L, 0L, 0L, 0L), K2 = c(3L, 2L, 3L, 
3L, 0L), K3 = 4:0, K4 = c(1L, 1L, 1L, 1L, 3L), K5 = c(3L, 3L, 
3L, 3L, 2L), K6 = c(0L, 0L, 0L, 0L, 0L), K7 = c(0L, 0L, 0L, 0L, 
0L), K8 = c(0L, 0L, 0L, 0L, 0L), K9 = c(0L, 0L, 0L, 0L, 1L), 
    K10 = c(1L, 1L, 1L, 1L, 0L), K11 = c(5L, 0L, 5L, 5L, 0L), 
    K12 = c(6L, 0L, 6L, 6L, 0L), K13 = c(7L, 0L, 7L, 7L, 0L), 
    K14 = c(8L, 0L, 5L, 8L, 0L), K15 = c(0L, 0L, 0L, 1L, 0L)), row.names = c(NA, 
-5L), class = "data.frame")

推荐阅读