首页 > 解决方案 > 将对应于同一对值“A”和“B”的数据帧col“C”中的值分配/连接到第二个数据帧。R - dplyr

问题描述

我有两个数据框。

我想在第一个 dafaramdf1的新列中分配频率,

df1 <- tibble(names1 = c('architecture', 'assessment', 'build'), 
              names2 = c('build', 'data', 'data'),
              frequency = c(36,13,720))

# A tibble: 3 x 3
  names1       names2 frequency
  <chr>        <chr>      <dbl>
1 architecture build         36
2 assessment   data          13
3 build        data          720

第二个数据框中df2

df2 <- tibble(names1 = c('architecture', 'build', 'assessment','assessment', 'business'), 
              names2 = c('build','architecture', 'data', 'data', 'strategy'))

  names1       names2        
  <chr>        <chr>         
1 architecture build         
2 build        architecture  
3 assessment   data          
4 assessment   data   
5 business     strategy         

对于这个结果:

  names1       names2        frequency
  <chr>        <chr>         <dbl>
1 architecture build         36
2 build        architecture  36
3 assessment   data          13
4 assessment   data          13
5 business     strategy      0

注意:有时我有df1$names1 == df2$names1 && df1$names2 == df2$names2 df1$names1 == df2$names2 && df1$names2 == df2$names1

1 architecture build         36
2 build        architecture  36

注意:我想保留没有匹配的行

5 business     strategy      0

标签: rdataframedplyr

解决方案


这里的问题是名称列的顺序对于加入很重要,因此您必须更新数据集并应用一致的顺序。

这是一个dplyr解决方案:

library(dplyr)

df1 <- tibble(names1 = c('architecture', 'assessment', 'build'), 
              names2 = c('build', 'data', 'data'),
              frequency = c(36,13,720))

df2 <- tibble(names1 = c('architecture', 'build', 'assessment','assessment', 'business'), 
              names2 = c('build','architecture', 'data', 'data', 'strategy'))

# update df1
df1 = df1 %>% 
  rowwise() %>% 
  mutate(names = paste0(sort(c(names1, names2)), collapse = "_")) %>% 
  select(names, frequency)

# update df2
df2 = df2 %>% 
  rowwise() %>% 
  mutate(names = paste0(sort(c(names1, names2)), collapse = "_"))

# join datasets and update columns
left_join(df2, df1, by="names") %>%
  mutate(frequency = coalesce(frequency, 0)) %>%
  select(-names) %>%
  ungroup()

#   names1       names2       frequency
#   <chr>        <chr>            <dbl>
# 1 architecture build               36
# 2 build        architecture        36
# 3 assessment   data                13
# 4 assessment   data                13
# 5 business     strategy             0

推荐阅读