首页 > 解决方案 > 合并奇数个带后缀的数据帧

问题描述

我正在尝试合并数据框列表,并且在这个社区中遇到了许多不同的答案,例如R-reduce with merge 和超过 2 个后缀(或:如何合并多个数据框并跟踪列)。但是在解决了这些答案之后,它适用于偶数个数据帧,但不适用于奇数个数据帧。

myDF <- cbind(typecar = rownames(mtcars), mtcars)
rownames(myDF) <- NULL
df1 <- myDF
df2 <-  myDF
df3<- myDF
df4 <- myDF

for(i in head(seq_along(list.df), -1)) {

  res <- merge(res, list.df[[i+1]], all = TRUE, 
               suffixes = sfx[i:(i+1)], by = "typecar")
}

这里上面的代码对偶数个df按预期工作,如下所示

list.df <- list(df1, df2, df3,df4)
sfx <- c(".df1", ".df2", ".df3", ".df4")

但是在尝试奇数时,最后一个 .df3 不会作为后缀添加

list.df <- list(df1, df2, df3)
sfx <- c(".df1", ".df2", ".df3")

这里的 colnames 看起来像这样。

 [1] "typecar"  "mpg.df1"  "cyl.df1"  "disp.df1" "hp.df1"   "drat.df1" "wt.df1"   "qsec.df1" "vs.df1"   "am.df1"   "gear.df1" "carb.df1" "mpg.df2" 
[14] "cyl.df2"  "disp.df2" "hp.df2"   "drat.df2" "wt.df2"   "qsec.df2" "vs.df2"   "am.df2"   "gear.df2" "carb.df2" "mpg"      "cyl"      "disp"    
[27] "hp"       "drat"     "wt"       "qsec"     "vs"       "am"       "gear"     "carb"  

我想要的是

 [1] "typecar"  "mpg.df1"  "cyl.df1"  "disp.df1" "hp.df1"   "drat.df1" "wt.df1"   "qsec.df1" "vs.df1"   "am.df1"   "gear.df1" "carb.df1" "mpg.df2" 
[14] "cyl.df2"  "disp.df2" "hp.df2"   "drat.df2" "wt.df2"   "qsec.df2" "vs.df2"   "am.df2"   "gear.df2" "carb.df2" "mpg.df3"      "cyl.df3"      "disp.df3"    
[27] "hp.df3"       "drat.df3"     "wt.df3"       "qsec.df3"     "vs.df3"       "am.df3"       "gear.df3"     "carb.df3"  

尝试使用 dplyr join 但情况相同。遇到了这个https://github.com/tidyverse/dplyr/issues/1296。有什么方法可以处理奇数个数据帧吗?

标签: rjoinmergedplyrpurrr

解决方案


一个更简单的选择是list使用相应的list名称或对象名称作为后缀命名元素列名称,但bymerge.

list.df <- Map(function(x, nm) {i1 <- names(x) != 'typecar'
            names(x)[i1] <- paste0(names(x)[i1], ".", nm)
            x
    }, list.df, names(list.df))

然后,我们利用Reduce/merge

out <- Reduce(function(...) merge(..., by = 'typecar', all = TRUE), list.df)
names(out)
#[1] "typecar"  "mpg.df1"  "cyl.df1"  "disp.df1" "hp.df1"   "drat.df1" "wt.df1"   "qsec.df1" "vs.df1"   "am.df1"   "gear.df1" "carb.df1"
#[13] "mpg.df2"  "cyl.df2"  "disp.df2" "hp.df2"   "drat.df2" "wt.df2"   "qsec.df2" "vs.df2"   "am.df2"   "gear.df2" "carb.df2" "mpg.df3" 
#[25] "cyl.df3"  "disp.df3" "hp.df3"   "drat.df3" "wt.df3"   "qsec.df3" "vs.df3"   "am.df3"   "gear.df3" "carb.df3"

数据

list.df <- mget(paste0('df', 1:3))

推荐阅读