首页 > 解决方案 > 如何通过使用与所需输出合并来合并 R 中的两个数据帧?

问题描述

我有两个数据框:

d1.Kids <- c("Jack",    "Jill", "Jillian",  "John", "James")
d1.States   <- c("CA",  "MA",   "DE",   "HI",   "PA")

d1 <- data.frame(d1.Kids, d1.States)

d1

   d1.Kids d1.States
1    Jack        CA
2    Jill        MA
3 Jillian        DE
4    John        HI
5   James        PA

d2.Ages <- c(10, 7, 12, 30)
d2.Kids <- c("Jill", "Jillian", "Jack", "Mary")

d2 <- data.frame(d2.Kids, d2.Ages)
d2
   d2.Kids d2.Ages
1    Jill      10
2 Jillian       7
3    Jack      12
4    Mary      30

当我合并这两个数据框时,我得到以下结果:

merge(d1,d2)

结果:

 d1.Kids d1.States d2.Kids d2.Ages
1     Jack        CA    Jill      10
2     Jill        MA    Jill      10
3  Jillian        DE    Jill      10
4     John        HI    Jill      10
5    James        PA    Jill      10
6     Jack        CA Jillian       7
7     Jill        MA Jillian       7
8  Jillian        DE Jillian       7
9     John        HI Jillian       7
10   James        PA Jillian       7
11    Jack        CA    Jack      12
12    Jill        MA    Jack      12
13 Jillian        DE    Jack      12
14    John        HI    Jack      12
15   James        PA    Jack      12
16    Jack        CA    Mary      30
17    Jill        MA    Mary      30
18 Jillian        DE    Mary      30
19    John        HI    Mary      30
20   James        PA    Mary      30

我想得到这个结果:

   kids    ages   states                    
1  jack     12     CA
2  jill     10     MA
3 jillian    7     DE
4 john      NA     HI
5 james     NA     PA
6  Mary     30     NA

标签: rmerge

解决方案


如果不使用by,它会进行交叉连接,我们可以通过该by选项来避免这种情况。由于两列中的列名不同,请使用by.x,by.y并使用all = TRUE

out <- merge(d1,d2, by.x = 'd1.Kids', by.y = 'd2.Kids', all = TRUE)

并通过删除前缀部分来更改“out”的名称

names(out) <- sub("^[^.]+\\.", "", names(out))

推荐阅读