首页 > 解决方案 > 使用 dplyr 函数循环?

问题描述

我有一个大型数据集,想查看各种图书馆(所有者)和拥有的材料(对象)之间的关系。到目前为止,我一直在手动执行此操作,通过传播和收集来计算每个所有者之间的重叠。

有没有办法循环这种类型的函数,这样我就不必重复 n 次?

这是一个数据样本(已经传播),工作确实将其收集回来。

library(dplyr)

object <- c(1:10)
A <- sample(0:1, 10, replace = TRUE)
B <- sample(0:1, 10, replace = TRUE)
C <- sample(0:1, 10, replace = TRUE)
D <- sample(0:1, 10, replace = TRUE)

df <- data.frame(object, A, B, C, D)

dfA <- df %>% filter(A == 1)
dfA$owner1 <- "A"
dfA <- dfA %>% gather(owner2, overlap, A:D, factor_key = TRUE)
dfA <- dfA %>% filter(overlap != 0)

dfB <- df %>% filter(B == 1)
dfB$owner1 <- "B"
dfB <- dfB %>% gather(owner2, overlap, A:D, factor_key = TRUE)
dfB <- dfB %>% filter(overlap != 0)

dfC <- df %>% filter(C == 1)
dfC$owner1 <- "C"
dfC <- dfC %>% gather(owner2, overlap, A:D, factor_key = TRUE)
dfC <- dfC %>% filter(overlap != 0)

dfD <- df %>% filter(D == 1)
dfD$owner1 <- "D"
dfD <- dfD %>% gather(owner2, overlap, A:D, factor_key = TRUE)
dfD <- dfD %>% filter(overlap != 0)

df2 <- rbind(dfA, dfB) %>% rbind(dfC) %>% rbind(dfD) 

结果允许您计算每个所有者拥有多少个对象(当它是 owner1 和 owner2 时),以及所有者之间的重叠。

谢谢

标签: rloopsfor-loop

解决方案


我想出了以下使用 tidyverse。它有点密集,所以我添加了评论:

pmap(df, \(...) {                           # iterate through rows
  c(...)[-1] %>%                            # drop first column
    .[. == 1] %>%                           # columns with value 1
    names() %>%                             # owners (A, B, ..)
    (\(xs) expand.grid(xs, xs)) %>%         # generate pairs of owners
    mutate(object = c(...)[1], .before = 1) # add object column
}) %>%
  bind_rows()                               # combine

返回(缩写):

  object Var1 Var2
1      1    A    A
2      1    C    A
3      1    A    C
4      1    C    C
5      2    A    A
6      2    B    A
...

对于 R < 4.1

pmap(df, function(...) {
  c(...)[-1] %>%
    .[. == 1] %>%
    names() %>%
    (function(xs) expand.grid(xs, xs)) %>%
    mutate(object = c(...)[1], .before = 1)
}) %>%
  bind_rows()

推荐阅读