首页 > 解决方案 > 如何合并两个数据框并仅保留不同的列(内容)?

问题描述

我有两个具有相同行大小和不同列号的数据框,列的名称也不同,但是其中一些内容可能相似。

即df1:

df1<- data.frame("a"=c("0","1","0","1","0","0","0"),
                "b"=c("1","1","1","1","1","0","0"),
                "c"=c("1","1","0","0","1","0","0"),
                "d"=c("1","1","1","1","1","1","1"))

df2:

df2<- data.frame("e"=c("1","1","0","1","0","0","0"),
                "f"=c("1","1","1","1","1","0","0"),
                "g"=c("0","0","0","0","1","0","0"),
                "h"=c("0","0","0","0","1","1","1"))

如果您看到,df1 的“b”列和 df2 的“f”列是相等的。因此,我想要的结果是一个新的数据框,如下所示:

df3 <- data.frame("a"=c("0","1","0","1","0","0","0"),
                  "c"=c("1","1","0","0","1","0","0"),
                  "d"=c("1","1","1","1","1","1","1"),
                  "e"=c("1","1","0","1","0","0","0"),
                  "g"=c("0","0","0","0","1","0","0"),
                  "h"=c("0","0","0","0","1","1","1"))

注意:列“b”和“f”(相似)不在新的 df3 中。我在网上看过,但我没有找到一个例子。我认为主要的复杂性是合并是按内容而不是按列名。

标签: rdataframemerge

解决方案


这是一个更多的tidyverse解决方案。

library(dplyr)
library(tidyr)
# based on Ronak's sapply approach
matches <- as.data.frame(sapply(df1, function(x) sapply(df2, function(y) identical(x, y)))) %>%
  rownames_to_column(var = "df2") %>%
  pivot_longer(-df2, names_to = "df1") %>% # pivot longer
  filter(value) # keep only the matches

# programmatically build list of names to remove
vars_remove <- c(matches$df1, matches$df2) # will remove var names that are matches
df1 %>% bind_cols(df2) %>%
  select(-any_of(vars_remove))

  a c d e g h
1 0 1 1 1 0 0
2 1 1 1 1 0 0
3 0 0 1 0 0 0
4 1 0 1 1 0 0
5 0 1 1 0 1 1
6 0 0 1 0 0 1
7 0 0 1 0 0 1

推荐阅读