首页 > 解决方案 > 在 R 的 for 循环中嵌入脚本

问题描述

我在 R 中有一个数据框,看起来像这样:

 library(tibble)
 sample <- tribble(~subj, ~session,
            "A", 1,
            "A", 2, 
            "A", 3,
            "B", 1,
            "B", 2,
            "C", 1,
            "C", 2,
            "C", 3,
            "C", 4)

正如您从这个示例中看到的,每个主题都有多个会话,但并非所有主题都具有相同数量的会话。我的真实数据集中有 94 行(5 个主题,每个主题在 15 到 20 个不同的会话之间)。

我有另一个脚本,它采用我的主要数据集(一组语言数据,每个会话中每个主题都有详细的语音特征,有近 200,000 行)并按主题和会话过滤,以创建一个距离矩阵,显示不同单词之间的欧几里得距离。由于实际原因,我无法在此处复制它,但在这里创建了一个示例脚本:

 library(tibble)
 data <- tribble(~subj, ~session, ~Target, ~S1C1_target,           # S1C1 = syllable 1, consonant 1
                 ~S1C1_T.Sonorant, ~S1C1_T.Consonantal,            # _T. = target consonant of S1C1
                 ~S1C1_T.Voice, ~S1C1_T.Nasal, ~S1C1_T.Degree,     # .Voice/.Nasal/etc are phonetic 
                                                                   # properties of the target word
            "A", 1, "electricity", "i", 0, 0, 0, 0, 0,
            "A", 1, "hectic", "h", 0.8, 0, 1, 0, 0,
            "A", 1, "pillow", "p", -1, 1, -1, 0, 0,
            "A", 2, "hello", "h", -0.5, 1, 0, -1, 0,
            "A", 2, "cup", "k", 0.8, 0, 1, 0, 0,
            "A", 2, "exam", "e", 0, 0, 0, 0, 0,
            "B", 1, "wug", "w", 0.8, 0, 1, 0, 0,
            "B", 1, "wug", "w", 0.8, 0, 1, 0, 0,
            "B", 1, "hug", "h", 0.8, 0, 1, 0, 0,
            "B", 2, "wug", "w", -0.5, 1, 0, -1, 0,
            "B", 2, "well", "w", 0.8, 0, 1, 0, 0,
            "B", 2, "what", "w", 0.8, 0, 1, 0, 0)

我想首先为每个会话中的每个主题创建一个数据子集。有时参与者在 中具有多个相同单词的标记Target,因此我在这里也为重复迭代创建了一个平均值:

 matrixA1 <- data %>%                          # name the data after the subj and session name/number
   filter(subj == "A" & session == 1) %>%  
   dplyr::select(-subj, -session) %>%          # leave only the numeric values + `Target`
   group_by(Target) %>% 
   summarize_all(.funs = list(mean))           # Average across targets with more than one token



 ##### Calculate Euclidean distance between each phonetic property of each S1C1 target consonant

 ones <- rep(1,nrow(matrixA1))  # count repeated rows

 Son.mat.S1C1_T <- matrixA1$S1C1_T.Sonorant %*% t(ones) - ones %*% t(matrixA1$S1C1_T.Sonorant)
 rownames(Son.mat.S1C1_T) <- matrixA1$Target
 colnames(Son.mat.S1C1_T) <- matrixA1$Target
 colnames(Son.mat.S1C1_T) <- paste(colnames(Son.mat.S1C1_T), "Son.S1C1_T", sep = "_")

 Son.mat.S1C1_T <- Son.mat.S1C1_T^2

 Con.mat.S1C1_T <- matrixA1$S1C1_T.Consonantal %*% t(ones) - ones %*% t(matrixA1$S1C1_T.Consonantal)
 rownames(Con.mat.S1C1_T) <- matrixA1$Target
 colnames(Con.mat.S1C1_T) <- matrixA1$Target
 colnames(Con.mat.S1C1_T) <- paste(colnames(Con.mat.S1C1_T), "Con.S1C1_T", sep = "_")

 Con.mat.S1C1_T <- Con.mat.S1C1_T^2

 Voice.mat.S1C1_T <- matrixA1$S1C1_T.Voice %*% t(ones) - ones %*% t(matrixA1$S1C1_T.Voice)
 rownames(Voice.mat.S1C1_T) <- matrixA1$Target
 colnames(Voice.mat.S1C1_T) <- matrixA1$Target
 colnames(Voice.mat.S1C1_T) <- paste(colnames(Voice.mat.S1C1_T), "Voice.S1C1_T", sep = "_")

 Voice.mat.S1C1_T <- Voice.mat.S1C1_T^2

 Nasal.mat.S1C1_T <- matrixA1$S1C1_T.Nasal %*% t(ones) - ones %*% t(matrixA1$S1C1_T.Nasal)
 rownames(Nasal.mat.S1C1_T) <- matrixA1$Target
 colnames(Nasal.mat.S1C1_T) <- matrixA1$Target
 colnames(Nasal.mat.S1C1_T) <- paste(colnames(Nasal.mat.S1C1_T), "Nasal.S1C1_T", sep = "_")

 S1C1.1A <- Son.mat.S1C1_T +
   Con.mat.S1C1_T +
   Voice.mat.S1C1_T +
   Nasal.mat.S1C1_T

 colnames(S1C1.1A) = gsub("_Son.S1C1_T", "", colnames(S1C1.1A))

这将创建一个看起来像这样的矩阵:

             electricity hectic pillow
 electricity    0.00      1.64   3.00
 hectic         1.64      0.00   8.24
 pillow         3.00      8.24   0.00

如您所见,这段代码已经相当大了,而真正的代码要长得多。我知道某种循环将是处理它的最佳方法,但我不知道如何运行它。我想做的是:

  1. 对于 中的每一行sample,创建一个数据框,其名称中包含subjsession作为标识符
  2. 对于这些数据帧中的每一个,运行上面的脚本 from #####,为每个主题和每个会话创建一个矩阵,如上图所示。

为此,我认为最好的方法是将脚本嵌入到一个 for 循环中,并指定它应该为sample.

标签: rfor-looptibble

解决方案


在我看来,您不需要参考您的sample数据框,因为有关subj和组合的信息session都在您的data. 如果不是这样,请告诉我。否则,这是我的方法。

首先,在根据主题-会话组合对数据进行分组之后,无需手动过滤 和 的每个组合的数据,而是一次性过滤您的subj数据sessionsummarize在此之前,给每个组合一个idwith group_indices

data_summ <- data %>%                          
  mutate(id = group_indices(., subj, session)) %>%
  group_by(subj, session, Target) %>% 
  summarize_all(.funs = list(mean))

现在,您可以使用列表方法来提高透明度。首先将汇总数据拆分为数据框列表,每个主题会话一个id

data_list <- data_summ %>%
  split(., f = .$id)

现在你可以得到第一个数据帧data_list[[1]],第二个data_list[[2]],依此类推。这允许您遍历列表并为每个列表元素计算矩阵。我已经简化了你的一些代码——例如,你不需要重新命名你的四个矩阵中的每一个(基于S1C1_T.ConsonantalS1C1_T.Consonantal...)。我建议您将所有结果存储在一个名为mat_list.

mat_list = list()

for (i in 1:length(data_list)) {

  element <- data_list[[i]]

  ones <- rep(1, nrow(element))

  sonorant_vec <- element$S1C1_T.Sonorant
  sonorant_mat <- (sonorant_vec %*% t(ones) - ones %*% t(sonorant_vec))^2

  consonantal_vec <- element$S1C1_T.Consonantal
  consonantal_mat <- (consonantal_vec %*% t(ones) - ones %*% t(consonantal_vec))^2

  voice_vec <- element$S1C1_T.Voice
  voice_mat <- (voice_vec %*% t(ones) - ones %*% t(voice_vec))^2

  nasal_vec <- element$S1C1_T.Nasal
  nasal_mat <- (nasal_vec %*% t(ones) - ones %*% t(nasal_vec))^2

  all_mat <- sonorant_mat + consonantal_mat + voice_mat + nasal_mat
  rownames(all_mat) <- element$Target
  colnames(all_mat) <- element$Target

  mat_list[[i]] <- all_mat
}

等等:

[[1]]
            electricity hectic pillow
electricity        0.00   1.64   3.00
hectic             1.64   0.00   8.24
pillow             3.00   8.24   0.00

[[2]]
       cup exam hello
cup   0.00 1.64  4.69
exam  1.64 0.00  2.25
hello 4.69 2.25  0.00

[[3]]
    hug wug
hug   0   0
wug   0   0

[[4]]
     well what  wug
well 0.00 0.00 4.69
what 0.00 0.00 4.69
wug  4.69 4.69 0.00

编辑:如果您想避免 for 循环,可以将循环内的块放入一个函数中,然后lapply将其放入data_list

lapply(data_list, FUN = function(element) {

  ones <- rep(1, nrow(element))

  sonorant_vec <- element$S1C1_T.Sonorant
  sonorant_mat <- (sonorant_vec %*% t(ones) - ones %*% t(sonorant_vec))^2

  consonantal_vec <- element$S1C1_T.Consonantal
  consonantal_mat <- (consonantal_vec %*% t(ones) - ones %*% t(consonantal_vec))^2

  voice_vec <- element$S1C1_T.Voice
  voice_mat <- (voice_vec %*% t(ones) - ones %*% t(voice_vec))^2

  nasal_vec <- element$S1C1_T.Nasal
  nasal_mat <- (nasal_vec %*% t(ones) - ones %*% t(nasal_vec))^2

  all_mat <- sonorant_mat + consonantal_mat + voice_mat + nasal_mat
  rownames(all_mat) <- element$Target
  colnames(all_mat) <- element$Target

  return(all_mat)
})

编辑 2

要根据主题-会话组合名称命名列表元素,您可以执行以下操作:

data_summ <- data %>%                          
  group_by(subj, session, Target) %>% 
  summarize_all(.funs = list(mean)) %>%
  mutate(subj_session = paste(subj, session))

然后根据这个新的subj_session标识符拆分数据:

data_list <- data_summ %>%
  split(., f = .$subj_session)

推荐阅读