首页 > 解决方案 > 在多个 R 数据帧之间提取多于一列的公共值

问题描述

想象一下,我有这 4 个数据框:

abc_df

abc_ID .  abc_classification
a      .     neutral
b      .     deletereous
c      .     benign

def_df

def_ID .  def_classification
f      .     neutral
a      .     neutral
c      .     benign

ghi_df

ghi_ID  .   ghi_classification
f       .     deletereous
c       .     benign
k       .     neutral

vmk_df

vmk_ID  .  vmk_classification
c       .     benign
k       .     deletereous
a       .     neutral

如您所见,列“dfname_ID”和“dfname_classification”不连续(点代表数据框中的另一列)并且具有不同的列名。因此,我想使用 columns 的索引而不是它们的名称来提取这 2 列的所有数据帧之间的公共行。

输出应该是这样的:

ID  .   classification
c   .    benign

我正在尝试使用intersect, lapply(mget(c('abc_df', 'def_df', 'ghi_df', 'vmk_df'))),但我不知道如何指定正确的命令。你知道我该如何解决这个问题吗?

标签: r

解决方案


可能需要使用 purrr,因此可能不需要转换为字符,因为 intersect 会强制它改变:

library(purrr)
library(magrittr)

COLUMNS = c(1,2,3)

list(abc_df,def_df,ghi_df,vmk_df) %>%
map(~mutate_if(.x[,COLUMNS],is.factor, as.character)) %>% 
map(~set_colnames(.x,c("id",".","classification"))) %>% 
reduce(intersect)

  id . classification
1  c .         benign

您的数据:

abc_df = structure(list(abc_ID = structure(1:3, .Label = c("a", "b", "c"
), class = "factor"), . = structure(c(1L, 1L, 1L), .Label = ".", class = "factor"), 
    abc_classification = structure(3:1, .Label = c("benign", 
    "deletereous", "neutral"), class = "factor")), class = "data.frame", row.names = c(NA, -3L))

def_df = structure(list(def_ID = structure(c(3L, 1L, 2L), .Label = c("a", 
"c", "f"), class = "factor"), . = structure(c(1L, 1L, 1L), .Label = ".", class = "factor"), 
    def_classification = structure(c(2L, 2L, 1L), .Label = c("benign", 
    "neutral"), class = "factor")), class = "data.frame", row.names = c(NA, -3L))

ghi_df = structure(list(ghi_ID = structure(c(2L, 1L, 3L), .Label = c("c", 
"f", "k"), class = "factor"), . = structure(c(1L, 1L, 1L), .Label = ".", class = "factor"), 
    ghi_classification = structure(c(2L, 1L, 3L), .Label = c("benign", 
    "deletereous", "neutral"), class = "factor")), class = "data.frame", row.names = c(NA, -3L))

vmk_df = structure(list(vmk_ID = structure(c(2L, 3L, 1L), .Label = c("a", 
"c", "k"), class = "factor"), . = structure(c(1L, 1L, 1L), .Label = ".", class = "factor"), 
    vmk_classification = structure(1:3, .Label = c("benign", 
    "deletereous", "neutral"), class = "factor")), class = "data.frame", row.names = c(NA, -3L))

推荐阅读