首页 > 解决方案 > 如何对我的两个表进行复杂的多列绑定?

问题描述

我有一个包含所有可能“类型”的列类型的单列数据框:

comment       type

used         enter 
used         open
used         close
used         update
not_used     delete

我从我的数据库中获取数据框。但在那个数据框中,并不是所有的“类型”都可能是。这是该表的示例:

ID    date            type           value
a1    2020-09-01       enter          18
a1    2020-09-01       close          15
a1    2020-09-02       enter          4
a2    2020-09-01       close          10
b1    2020-09-02       update         10

如您所见,ID a1 只有两种类型:输入和关闭。a2 只有关闭,b1 只有更新。

我想以这种方式绑定这两个表,因此不在我的表中的“类型”对于每个 ID 和日期的值都为零。那么,如何绑定这两个表来得到这个:

comment            ID    date            type           value
used               a1    2020-09-01       enter          18
used               a1    2020-09-01       open           0
used               a1    2020-09-01       close          15
used               a1    2020-09-01       update         0
not_used           a1    2020-09-01       delete         0
used               a1    2020-09-02       enter          4
used               a1    2020-09-02       open           0
used               a1    2020-09-02       close          0
used               a1    2020-09-02       update         0
not_used           a1    2020-09-02       delete         0
used               a2    2020-09-01       enter          0
used               a2    2020-09-01       open           0
used               a2    2020-09-01       close          10
used               a2    2020-09-01       update         0
not_used           a2    2020-09-01       delete         0
used               b1    2020-09-01       enter          0
used               b1    2020-09-01       open           0
used               b1    2020-09-01       close          0
used               b1    2020-09-01       update         10
not_used           b1    2020-09-01       delete         0

如您所见,我还保留了“评论”栏。我怎么能这样做?

Thise 不保留“评论”列:

library(dplyr)
library(tidyr)

df2 %>%
  mutate(type = factor(type, levels = df1$type)) %>%
  group_by(ID, date) %>%
  complete(type, fill = list(value = 0))

标签: rdataframebinding

解决方案


尝试这个:

library(tidyverse)
#Code
new <- df2 %>%
  mutate(type = factor(type, levels = df1$type)) %>%
  group_by(ID, date) %>%
  complete(type, fill = list(value = 0)) %>%
  left_join(df1)

输出:

# A tibble: 20 x 5
# Groups:   ID, date [4]
   ID    date       type   value comment 
   <chr> <chr>      <chr>  <dbl> <chr>   
 1 a1    2020-09-01 enter     18 used    
 2 a1    2020-09-01 open       0 used    
 3 a1    2020-09-01 close     15 used    
 4 a1    2020-09-01 update     0 used    
 5 a1    2020-09-01 delete     0 not_used
 6 a1    2020-09-02 enter      4 used    
 7 a1    2020-09-02 open       0 used    
 8 a1    2020-09-02 close      0 used    
 9 a1    2020-09-02 update     0 used    
10 a1    2020-09-02 delete     0 not_used
11 a2    2020-09-01 enter      0 used    
12 a2    2020-09-01 open       0 used    
13 a2    2020-09-01 close     10 used    
14 a2    2020-09-01 update     0 used    
15 a2    2020-09-01 delete     0 not_used
16 b1    2020-09-02 enter      0 used    
17 b1    2020-09-02 open       0 used    
18 b1    2020-09-02 close      0 used    
19 b1    2020-09-02 update    10 used    
20 b1    2020-09-02 delete     0 not_used

推荐阅读