首页 > 解决方案 > 通过两个相同的变量组合两个数据框

问题描述

我有两个数据框(df1 和 df2),并希望将它们合并到一个 df 中,其中列“GEO”、“POP”、“Value”、“Mean” 不匹配的行应该分配有“NA”

> df1
        GEO    Value            POP
1         Belgium   986494    Adolescents
2         Denmark   542496    Adolescents
3         Finland   472801    Adolescents
4          France  6568728    Adolescents
5        Germany   6177477    Adolescents
6           Italy  4564035    Adolescents
7     Netherlands  1608971    Adolescents
8           Spain  3550102    Adolescents
9  United Kingdom  5815087    Adolescents
10        Belgium  6910856         Adults
11        Denmark  3423077         Adults
12        Finland  3318043         Adults
13         France 39536853         Adults
14       Germany  50839124         Adults
15          Italy 37609721         Adults
16    Netherlands 10467463         Adults
17          Spain 29722963         Adults
18 United Kingdom 39436511         Adults

> df2
              GEO            POP    Mean
1         Belgium    Adolescents 1221.75
2         Denmark    Adolescents 2669.66
3         Finland    Adolescents 1378.44
4          France    Adolescents 2293.82
5         Germany    Adolescents 2412.83
6           Italy    Adolescents 1282.08
7     Netherlands    Adolescents 1431.87
8           Spain    Adolescents 5410.47
9  United Kingdom    Adolescents 1026.75
10        Belgium         Adults 1567.43
11        Denmark         Adults 4241.10
12        Finland         Adults 3938.95
13         France         Adults 3231.94
14        Germany         Adults 1840.54
15          Italy         Adults 1337.15
16    Netherlands         Adults 4157.15
17          Spain         Adults 3897.04

我需要将它们组合成一个df!我用 dplyr 尝试了一些功能:

bind_rows(df1,df2)
intersect(df1, df2)

Error: not compatible: 
- Cols in y but not x: `Value`. 
- Cols in x but not y: `Mean`. 

我也尝试加入

left_join(df1,df2, by "GEO", "POP")

但这只能通过一个共同的列实现,并且我将有两列 /GEO 和 POP)在连接过程中必须考虑。你有想法吗?

标签: rdplyr

解决方案


df1 %>%
    left_join(df2, by = c("GEO", "POP"))

推荐阅读