r - 在 R 中的截止日期周围的窗口中创建虚拟对象
问题描述
我有一个看起来像这样的数据框(df):
library(dplyr)
library(lubridate)
id gender education e-week
1 100236 0 Bachelor or equivalent 2012-01-22
2 100237 0 Secondary education 2010-03-14
3 100248 0 Master and doctoral 2010-04-25
4 100257 0 Master and doctoral 2012-01-22
5 100271 0 Bachelor or equivalent 2011-05-22
6 100285 0 Primary education 2012-01-15
7 100303 0 Master and doctoral 2013-01-13
8 100305 0 Secondary education 2011-09-25
9 100316 0 Secondary education 2012-12-30
10 100354 0 Secondary education 2010-08-22
真实的数据集要长得多。我从原始日期得到了“周”变量
df <- df %>%
mutate(., e_week = floor_date(date_exit, unit = "week")
下一步是为从感兴趣的日期开始的不同时间“窗口”创建虚拟变量。首先,我手动创建了它们,如下所示:
df <- df %>%
mutate(.,treshold_1week =ifelse(e_week %within%
interval(start = as.Date('2009-05-17') - weeks(x = 1),
end = '2009-05-17'),
1, 0 ))
这是感兴趣日期前的 1 周。在感兴趣的日期之前和之后的 2、3、4、5 和 6 周内,我手动进行了此操作。现在我想将窗口扩大到感兴趣日期前后的 40 周。有没有一种更快更有效的方法来做到这一点,而无需ifelse()
为每个虚拟变量编写一个新函数?
我面临的挑战是,我想为接近感兴趣日期的每周创建一个新的虚拟变量。因此,我正在寻找 40 个虚拟变量,它们基本上表示缩短的时间间隔,即
treshold_40weeks、treshold_39weeks、treshold38_weeks 等。
解决方案
使用dplyr
,purrr
library(dplyr)
library(purrr)
library(lubridate)
data <- tibble(e_week = seq(as.Date("2008-01-01"), by = "7 days", length.out = 300))
week <- seq(1, 40, by = 1)
generate_dummy <- function(x, df) {
df %>%
mutate("threshod_{x}week" := ifelse(e_week %within%
interval(start = as.Date('2009-05-17') - weeks(x),
end = '2009-05-17'),
1, 0 ))
}
reduce(map(week, generate_dummy, df = data), .f = left_join, by = "e_week")
输出
e_week threshod_1week threshod_2week threshod_3week
Min. :2008-01-01 Min. :0.000000 Min. :0.000000 Min. :0.00
1st Qu.:2009-06-07 1st Qu.:0.000000 1st Qu.:0.000000 1st Qu.:0.00
Median :2010-11-12 Median :0.000000 Median :0.000000 Median :0.00
Mean :2010-11-12 Mean :0.003333 Mean :0.006667 Mean :0.01
3rd Qu.:2012-04-18 3rd Qu.:0.000000 3rd Qu.:0.000000 3rd Qu.:0.00
Max. :2013-09-24 Max. :1.000000 Max. :1.000000 Max. :1.00
threshod_4week threshod_5week threshod_6week threshod_7week
Min. :0.00000 Min. :0.00000 Min. :0.00 Min. :0.00000
1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00 1st Qu.:0.00000
Median :0.00000 Median :0.00000 Median :0.00 Median :0.00000
Mean :0.01333 Mean :0.01667 Mean :0.02 Mean :0.02333
3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00 3rd Qu.:0.00000
Max. :1.00000 Max. :1.00000 Max. :1.00 Max. :1.00000
threshod_8week threshod_9week threshod_10week threshod_11week
Min. :0.00000 Min. :0.00 Min. :0.00000 Min. :0.00000
1st Qu.:0.00000 1st Qu.:0.00 1st Qu.:0.00000 1st Qu.:0.00000
Median :0.00000 Median :0.00 Median :0.00000 Median :0.00000
Mean :0.02667 Mean :0.03 Mean :0.03333 Mean :0.03667
3rd Qu.:0.00000 3rd Qu.:0.00 3rd Qu.:0.00000 3rd Qu.:0.00000
Max. :1.00000 Max. :1.00 Max. :1.00000 Max. :1.00000
threshod_12week threshod_13week threshod_14week threshod_15week
Min. :0.00 Min. :0.00000 Min. :0.00000 Min. :0.00
1st Qu.:0.00 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00
Median :0.00 Median :0.00000 Median :0.00000 Median :0.00
Mean :0.04 Mean :0.04333 Mean :0.04667 Mean :0.05
3rd Qu.:0.00 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00
Max. :1.00 Max. :1.00000 Max. :1.00000 Max. :1.00
threshod_16week threshod_17week threshod_18week threshod_19week
Min. :0.00000 Min. :0.00000 Min. :0.00 Min. :0.00000
1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00 1st Qu.:0.00000
Median :0.00000 Median :0.00000 Median :0.00 Median :0.00000
Mean :0.05333 Mean :0.05667 Mean :0.06 Mean :0.06333
3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00 3rd Qu.:0.00000
Max. :1.00000 Max. :1.00000 Max. :1.00 Max. :1.00000
threshod_20week threshod_21week threshod_22week threshod_23week
Min. :0.00000 Min. :0.00 Min. :0.00000 Min. :0.00000
1st Qu.:0.00000 1st Qu.:0.00 1st Qu.:0.00000 1st Qu.:0.00000
Median :0.00000 Median :0.00 Median :0.00000 Median :0.00000
Mean :0.06667 Mean :0.07 Mean :0.07333 Mean :0.07667
3rd Qu.:0.00000 3rd Qu.:0.00 3rd Qu.:0.00000 3rd Qu.:0.00000
Max. :1.00000 Max. :1.00 Max. :1.00000 Max. :1.00000
threshod_24week threshod_25week threshod_26week threshod_27week
Min. :0.00 Min. :0.00000 Min. :0.00000 Min. :0.00
1st Qu.:0.00 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00
Median :0.00 Median :0.00000 Median :0.00000 Median :0.00
Mean :0.08 Mean :0.08333 Mean :0.08667 Mean :0.09
3rd Qu.:0.00 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00
Max. :1.00 Max. :1.00000 Max. :1.00000 Max. :1.00
threshod_28week threshod_29week threshod_30week threshod_31week
Min. :0.00000 Min. :0.00000 Min. :0.0 Min. :0.0000
1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.0 1st Qu.:0.0000
Median :0.00000 Median :0.00000 Median :0.0 Median :0.0000
Mean :0.09333 Mean :0.09667 Mean :0.1 Mean :0.1033
3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.0 3rd Qu.:0.0000
Max. :1.00000 Max. :1.00000 Max. :1.0 Max. :1.0000
threshod_32week threshod_33week threshod_34week threshod_35week
Min. :0.0000 Min. :0.00 Min. :0.0000 Min. :0.0000
1st Qu.:0.0000 1st Qu.:0.00 1st Qu.:0.0000 1st Qu.:0.0000
Median :0.0000 Median :0.00 Median :0.0000 Median :0.0000
Mean :0.1067 Mean :0.11 Mean :0.1133 Mean :0.1167
3rd Qu.:0.0000 3rd Qu.:0.00 3rd Qu.:0.0000 3rd Qu.:0.0000
Max. :1.0000 Max. :1.00 Max. :1.0000 Max. :1.0000
threshod_36week threshod_37week threshod_38week threshod_39week
Min. :0.00 Min. :0.0000 Min. :0.0000 Min. :0.00
1st Qu.:0.00 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.00
Median :0.00 Median :0.0000 Median :0.0000 Median :0.00
Mean :0.12 Mean :0.1233 Mean :0.1267 Mean :0.13
3rd Qu.:0.00 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.00
Max. :1.00 Max. :1.0000 Max. :1.0000 Max. :1.00
threshod_40week
Min. :0.0000
1st Qu.:0.0000
Median :0.0000
Mean :0.1333
3rd Qu.:0.0000
Max. :1.0000
推荐阅读
- c# - 为什么我的 Web API 不返回 JSON?
- c++ - 如何在给定图中找到最大的二分子图?
- node.js - 通过 Mongoose 按数组值查询文档
- javascript - 在 Razor 和 Umbraco 中计算一个数组中的多个相同值的项目
- javascript - 纯 JavaScript 和 SWAPI
- java - Mongodb聚合的Java驱动表示
- python - 如何根据日期解析数据帧
- unix - 如何在 solaris 上制作可加载的内核模块?没有Linux
- vue.js - Vue js如何将一个方法(vuex mapGetters方法)全局导入到项目中
- scala - 使用具有常量值的 var 在 Spark DataFrame 中创建新列