首页 > 解决方案 > 在 R 中的截止日期周围的窗口中创建虚拟对象

问题描述

我有一个看起来像这样的数据框(df):

library(dplyr)
library(lubridate)

       id gender              education     e-week
1  100236      0 Bachelor or equivalent 2012-01-22
2  100237      0    Secondary education 2010-03-14
3  100248      0    Master and doctoral 2010-04-25
4  100257      0    Master and doctoral 2012-01-22
5  100271      0 Bachelor or equivalent 2011-05-22
6  100285      0      Primary education 2012-01-15
7  100303      0    Master and doctoral 2013-01-13
8  100305      0    Secondary education 2011-09-25
9  100316      0    Secondary education 2012-12-30
10 100354      0    Secondary education 2010-08-22

真实的数据集要长得多。我从原始日期得到了“周”变量

df <- df %>%
  mutate(., e_week = floor_date(date_exit, unit = "week")

下一步是为从感兴趣的日期开始的不同时间“窗口”创建虚拟变量。首先,我手动创建了它们,如下所示:

df <- df %>%
  mutate(.,treshold_1week =ifelse(e_week %within% 
                                     interval(start = as.Date('2009-05-17') - weeks(x = 1), 
                                              end = '2009-05-17'),
                                   1, 0 ))

这是感兴趣日期前的 1 周。在感兴趣的日期之前和之后的 2、3、4、5 和 6 周内,我手动进行了此操作。现在我想将窗口扩大到感兴趣日期前后的 40 周。有没有一种更快更有效的方法来做到这一点,而无需ifelse()为每个虚拟变量编写一个新函数?

我面临的挑战是,我想为接近感兴趣日期的每周创建一个新的虚拟变量。因此,我正在寻找 40 个虚拟变量,它们基本上表示缩短的时间间隔,即

treshold_40weeks、treshold_39weeks、treshold38_weeks 等。

标签: rdplyrdata-manipulation

解决方案


使用dplyr,purrr

library(dplyr) 
library(purrr)
library(lubridate)
data <- tibble(e_week = seq(as.Date("2008-01-01"), by = "7 days", length.out = 300))   

week <- seq(1, 40, by = 1)
generate_dummy <- function(x, df) {
  df %>%
    mutate("threshod_{x}week" := ifelse(e_week %within% 
        interval(start = as.Date('2009-05-17') - weeks(x), 
          end = '2009-05-17'),
      1, 0 ))
}

reduce(map(week, generate_dummy, df = data), .f = left_join, by = "e_week")

输出

     e_week           threshod_1week     threshod_2week     threshod_3week
 Min.   :2008-01-01   Min.   :0.000000   Min.   :0.000000   Min.   :0.00  
 1st Qu.:2009-06-07   1st Qu.:0.000000   1st Qu.:0.000000   1st Qu.:0.00  
 Median :2010-11-12   Median :0.000000   Median :0.000000   Median :0.00  
 Mean   :2010-11-12   Mean   :0.003333   Mean   :0.006667   Mean   :0.01  
 3rd Qu.:2012-04-18   3rd Qu.:0.000000   3rd Qu.:0.000000   3rd Qu.:0.00  
 Max.   :2013-09-24   Max.   :1.000000   Max.   :1.000000   Max.   :1.00  
 threshod_4week    threshod_5week    threshod_6week threshod_7week   
 Min.   :0.00000   Min.   :0.00000   Min.   :0.00   Min.   :0.00000  
 1st Qu.:0.00000   1st Qu.:0.00000   1st Qu.:0.00   1st Qu.:0.00000  
 Median :0.00000   Median :0.00000   Median :0.00   Median :0.00000  
 Mean   :0.01333   Mean   :0.01667   Mean   :0.02   Mean   :0.02333  
 3rd Qu.:0.00000   3rd Qu.:0.00000   3rd Qu.:0.00   3rd Qu.:0.00000  
 Max.   :1.00000   Max.   :1.00000   Max.   :1.00   Max.   :1.00000  
 threshod_8week    threshod_9week threshod_10week   threshod_11week  
 Min.   :0.00000   Min.   :0.00   Min.   :0.00000   Min.   :0.00000  
 1st Qu.:0.00000   1st Qu.:0.00   1st Qu.:0.00000   1st Qu.:0.00000  
 Median :0.00000   Median :0.00   Median :0.00000   Median :0.00000  
 Mean   :0.02667   Mean   :0.03   Mean   :0.03333   Mean   :0.03667  
 3rd Qu.:0.00000   3rd Qu.:0.00   3rd Qu.:0.00000   3rd Qu.:0.00000  
 Max.   :1.00000   Max.   :1.00   Max.   :1.00000   Max.   :1.00000  
 threshod_12week threshod_13week   threshod_14week   threshod_15week
 Min.   :0.00    Min.   :0.00000   Min.   :0.00000   Min.   :0.00   
 1st Qu.:0.00    1st Qu.:0.00000   1st Qu.:0.00000   1st Qu.:0.00   
 Median :0.00    Median :0.00000   Median :0.00000   Median :0.00   
 Mean   :0.04    Mean   :0.04333   Mean   :0.04667   Mean   :0.05   
 3rd Qu.:0.00    3rd Qu.:0.00000   3rd Qu.:0.00000   3rd Qu.:0.00   
 Max.   :1.00    Max.   :1.00000   Max.   :1.00000   Max.   :1.00   
 threshod_16week   threshod_17week   threshod_18week threshod_19week  
 Min.   :0.00000   Min.   :0.00000   Min.   :0.00    Min.   :0.00000  
 1st Qu.:0.00000   1st Qu.:0.00000   1st Qu.:0.00    1st Qu.:0.00000  
 Median :0.00000   Median :0.00000   Median :0.00    Median :0.00000  
 Mean   :0.05333   Mean   :0.05667   Mean   :0.06    Mean   :0.06333  
 3rd Qu.:0.00000   3rd Qu.:0.00000   3rd Qu.:0.00    3rd Qu.:0.00000  
 Max.   :1.00000   Max.   :1.00000   Max.   :1.00    Max.   :1.00000  
 threshod_20week   threshod_21week threshod_22week   threshod_23week  
 Min.   :0.00000   Min.   :0.00    Min.   :0.00000   Min.   :0.00000  
 1st Qu.:0.00000   1st Qu.:0.00    1st Qu.:0.00000   1st Qu.:0.00000  
 Median :0.00000   Median :0.00    Median :0.00000   Median :0.00000  
 Mean   :0.06667   Mean   :0.07    Mean   :0.07333   Mean   :0.07667  
 3rd Qu.:0.00000   3rd Qu.:0.00    3rd Qu.:0.00000   3rd Qu.:0.00000  
 Max.   :1.00000   Max.   :1.00    Max.   :1.00000   Max.   :1.00000  
 threshod_24week threshod_25week   threshod_26week   threshod_27week
 Min.   :0.00    Min.   :0.00000   Min.   :0.00000   Min.   :0.00   
 1st Qu.:0.00    1st Qu.:0.00000   1st Qu.:0.00000   1st Qu.:0.00   
 Median :0.00    Median :0.00000   Median :0.00000   Median :0.00   
 Mean   :0.08    Mean   :0.08333   Mean   :0.08667   Mean   :0.09   
 3rd Qu.:0.00    3rd Qu.:0.00000   3rd Qu.:0.00000   3rd Qu.:0.00   
 Max.   :1.00    Max.   :1.00000   Max.   :1.00000   Max.   :1.00   
 threshod_28week   threshod_29week   threshod_30week threshod_31week 
 Min.   :0.00000   Min.   :0.00000   Min.   :0.0     Min.   :0.0000  
 1st Qu.:0.00000   1st Qu.:0.00000   1st Qu.:0.0     1st Qu.:0.0000  
 Median :0.00000   Median :0.00000   Median :0.0     Median :0.0000  
 Mean   :0.09333   Mean   :0.09667   Mean   :0.1     Mean   :0.1033  
 3rd Qu.:0.00000   3rd Qu.:0.00000   3rd Qu.:0.0     3rd Qu.:0.0000  
 Max.   :1.00000   Max.   :1.00000   Max.   :1.0     Max.   :1.0000  
 threshod_32week  threshod_33week threshod_34week  threshod_35week 
 Min.   :0.0000   Min.   :0.00    Min.   :0.0000   Min.   :0.0000  
 1st Qu.:0.0000   1st Qu.:0.00    1st Qu.:0.0000   1st Qu.:0.0000  
 Median :0.0000   Median :0.00    Median :0.0000   Median :0.0000  
 Mean   :0.1067   Mean   :0.11    Mean   :0.1133   Mean   :0.1167  
 3rd Qu.:0.0000   3rd Qu.:0.00    3rd Qu.:0.0000   3rd Qu.:0.0000  
 Max.   :1.0000   Max.   :1.00    Max.   :1.0000   Max.   :1.0000  
 threshod_36week threshod_37week  threshod_38week  threshod_39week
 Min.   :0.00    Min.   :0.0000   Min.   :0.0000   Min.   :0.00   
 1st Qu.:0.00    1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.00   
 Median :0.00    Median :0.0000   Median :0.0000   Median :0.00   
 Mean   :0.12    Mean   :0.1233   Mean   :0.1267   Mean   :0.13   
 3rd Qu.:0.00    3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.:0.00   
 Max.   :1.00    Max.   :1.0000   Max.   :1.0000   Max.   :1.00   
 threshod_40week 
 Min.   :0.0000  
 1st Qu.:0.0000  
 Median :0.0000  
 Mean   :0.1333  
 3rd Qu.:0.0000  
 Max.   :1.0000

推荐阅读