首页 > 解决方案 > R合并部分匹配

问题描述

对此有很多答案,但我没有发现我正在处理的问题。

我有2个数据框:

df1:

在此处输入图像描述

df2:

在此处输入图像描述

setA <- read.table("df1.txt",sep="\t", header=TRUE)
setB <- read.table("df2.txt",sep="\t", header=TRUE)

所以,我想按列值匹配行:

library(data.table)
setC <-merge(setA, setB, by.x = "name", by.y = "name", all.x = FALSE)

我得到这个输出:

df3:

在此处输入图像描述

因为在 df 我也有 de 值 1,但用“;”分隔。我怎样才能得到欲望输出?

谢谢!!

标签: rdataframemerge

解决方案


将来请应用函数 dput(df1) 和 dput(df2) 并将控制台的输出复制并粘贴到您的问题中。

Base R 对两部分问题的解决方案:

# First unstack the 1;7 row into two separate rows: 

name_split <- strsplit(df1$name, ";")

# If the values of last vector uniquely identify each row in the dataframe: 

df_ro <- data.frame(name = unlist(name_split),
                     last = rep(df1$last, sapply(name_split, length)),
                     stringsAsFactors = FALSE)

# Left join to achieve the same result as first solution 
# without specifically naming each vector: 

df1_ro <- merge(df1[,names(df1) != "name"], df_ro, by = "last", all.x = TRUE)

# Then perform an inner join preventing a name space collision: 

df3 <- merge(df1_ro, setNames(df2, paste0(names(df2), ".x")),
             by.x = "name", by.y = "name.x")

# If you wanted to perform an inner join on all intersecting columns (returning
# no results because values in last and colour are different then): 

df3 <- merge(df1_ro, df2, by = intersect(names(df1_ro), names(df2)))

数据:

df1 <- data.frame(name = c("1;7", "3", "4", "5"),
                  last = c("p", "q", "r", "s"),
                  colour = c("a", "s", "d", "f"), stringsAsFactors = FALSE)
df2 <- data.frame(name = c("1", "2", "3", "4"),
                  last = c("a", "b", "c", "d"),
                  colour = c("p", "q", "r", "s"), stringsAsFactors = FALSE)

推荐阅读