首页 > 解决方案 > 合并 2 个具有公共列的数据框,并在公共值不为 NA 时添加指示符

问题描述

我有2个数据框:

df_1:

  Date                time_series_1           time_series_2       
1  01-01-2019               NA                      10                      
2  02-01-2019               5                       NA                       
3  03-01-2019               10                      NA                          
4  04-01-2019               20                      6                                       

df_2:

  Date                time_series_1           time_series_2            time_series_3
1  01-01-2019               NA                      10                       10
2  02-01-2019               5                       NA                       87
3  03-01-2019               10                      NA                       45   
4  04-01-2019               20                      6                        221

两个数据框都有公共列:time_series_1 和 time_series_2。(df_1 中的所有列都包含在 df_2 中)

我的目标是合并这 2 个数据框,以长格式显示合并后的数据框,如果特定值 velongs 到 df_1 并且在特定日期不是 NA,则添加指示符。

所需的输出将是:DF_LONG_MERGED:

  Date                variable           value            indicator
1  01-01-2019      time_series_1          NA                  0
2  01-01-2019      time_series_2          10                  1
3  01-01-2019      time_series_3          10                  0  
4  02-01-2019      time_series_1          5                   1
5  02-01-2019      time_series_2          NA                  0            
6  02-01-2019      time_series_3          87                  0           
7  03-01-2019      time_series_1          10                  1            
8  03-01-2019      time_series_2          NA                  0 
9  03-01-2019      time_series_3          45                  0
10 04-01-2019      time_series_1          20                  1  
11 04-01-2019      time_series_2          6                   1 
12 04-01-2019      time_series_3          221                 0

有关如何添加此指标的任何建议?

标签: rdplyr

解决方案


这是否有效:

library(dplyr)
library(tidyr)

df_1 %>% pivot_longer(-Date, names_to = 'variable') %>% mutate(indicator = case_when(!is.na(value) ~ 1, TRUE ~ 0)) %>% right_join(
df_2 %>% pivot_longer(-Date, names_to = 'variable') 
) %>% mutate(indicator = replace_na(indicator, 0)) %>% arrange(Date)
Joining, by = c("Date", "variable", "value")
# A tibble: 12 x 4
   Date       variable      value indicator
   <chr>      <chr>         <int>     <dbl>
 1 01-01-2019 time_series_1    NA         0
 2 01-01-2019 time_series_2    10         1
 3 01-01-2019 time_series_3    10         0
 4 02-01-2019 time_series_1     5         1
 5 02-01-2019 time_series_2    NA         0
 6 02-01-2019 time_series_3    87         0
 7 03-01-2019 time_series_1    10         1
 8 03-01-2019 time_series_2    NA         0
 9 03-01-2019 time_series_3    45         0
10 04-01-2019 time_series_1    20         1
11 04-01-2019 time_series_2     6         1
12 04-01-2019 time_series_3   221         0

推荐阅读