首页 > 解决方案 > 通过不同的列将列表中的多个数据框合并到另一个数据框

问题描述

在我的代码中,我使用多个left_join将单独的数据帧合并到我在 dplyr 链中处理的数据帧。我将要与另一个合并的数据框导入到一个列表中,然后使用 lapply 直接在该列表上进行一些操作以准备合并。

到目前为止,我已经习惯list2env(list, envir = .GlobalEnv)从列表中创建单独的数据框,然后left_join通过每个数据框的唯一列分别合并每个数据框,如下所示:

测试数据:

列表:

structure(list(df2 = structure(list(x = structure(c(2L, 1L, 3L
), .Label = c("A", "B", "C"), class = "factor"), a = c(-0.331543943439452, 
0.0588350184156617, 1.03657229544754)), .Names = c("x", "a"), row.names = c(NA, 
-3L), class = "data.frame"), df3 = structure(list(z = structure(c(3L, 
2L, 1L), .Label = c("K", "L", "M"), class = "factor"), b = c(-0.897094152848114, 
0.97612075490695, 0.650264147064918)), .Names = c("z", "b"), row.names = c(NA, 
-3L), class = "data.frame")), .Names = c("df2", "df3"))

要创建单独的数据框:

list2env(testlist, envir = .GlobalEnv)

数据框:

structure(list(x = structure(1:3, .Label = c("A", "B", "C"), class = "factor"), 
    y = 1:3, z = structure(1:3, .Label = c("K", "L", "M"), class = "factor")), .Names = c("x", 
"y", "z"), row.names = c(NA, -3L), class = "data.frame")

加入:

library(dplyr)

test_df %>%
    left_join(., df2, by = "x") %>%
    left_join(., df3, by = "z")

(请注意,我的列表大约有八个数据框,每个数据框有 2 - 3 列。为简单起见,我在此列表中仅包含两个数据框)

所有数据框都有自己单独的“by”列。我想知道的是是否有更简单的方法可以做到这一点,f。例如,通过直接与整个列表合并,并自动检测哪些列是相似的,并为每个数据帧合并它们,而不是分别进行八次left_join?

编辑

我尝试按照@akrun 的建议运行以下代码:

out <- test
for(i in seq_along(table_list)) {
  nm1 <- intersect(names(out), names(table_list[[i]]))
  out <- merge(out, table_list[[i]], by = nm1)
}
out

test要合并到的数据框在哪里,并且table_list是数据框的列表。这适用于这些小型测试数据帧,但似乎会在数据帧中引入单个行的重复,从而导致更多行。

更复杂的示例数据框:

structure(list(x = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L
), .Label = c("A", "B", "C", "D"), class = "factor"), y = c(1, 
2, 3, 4, 1, 2, 3, 4), z = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 
1L, 2L), .Label = c("K", "L", "M"), class = "factor")), .Names = c("x", 
"y", "z"), row.names = c(NA, -8L), class = "data.frame")

标签: r

解决方案


Using the complicated test_df, why not use reduce from purrr together with left_join from dplyr? I have included the messages and warning message in the code below.

library(dplyr)
library(purrr)

all_dfs <- reduce(my_list, left_join, .init = test_df)

# (warning) messages from using left_join
# Joining, by = "x"
# Joining, by = "z"
# Warning message:
# Column `x` joining factors with different levels, coercing to character vector 

all_dfs

  x y z           a          b
1 A 1 K  0.05883502  0.6502641
2 B 2 L -0.33154394  0.9761208
3 C 3 M  1.03657230 -0.8970942
4 D 4 K          NA  0.6502641
5 A 1 L  0.05883502  0.9761208
6 B 2 M -0.33154394 -0.8970942
7 C 3 K  1.03657230  0.6502641
8 D 4 L          NA  0.9761208

推荐阅读