r - 按时间窗口条件组替换缺失值
问题描述
我按 分组subject_id
。
变量 acpa 在 所在的行上测量row_type == BIOBANK
。我想替换<NA>
其他行上的 ,但只能在给定的时间窗口中,使用该index_date
变量。
这是我的桌子:
subject_id index_date row_type acpa
1: 155 2010-05-12 BASELINE_info <NA>
2: 155 2010-05-12 BASELINE_info <NA>
3: 155 2010-05-12 BIOBANK N
4: 155 2010-05-12 PHYSICAL_exam <NA>
5: 155 2010-09-29 FOLLOW_UP <NA>
6: 155 2011-11-30 FOLLOW_UP <NA>
7: 155 2013-06-01 FOLLOW_UP <NA>
8: 155 2014-06-01 FOLLOW_UP <NA>
9: 155 2015-06-01 FOLLOW_UP <NA>
10: 155 2016-08-31 FOLLOW_UP <NA>
11: 568 2012-04-07 BASELINE_info <NA>
12: 568 2012-04-07 BASELINE_info <NA>
13: 568 2012-04-07 FOLLOW_UP <NA>
14: 568 2012-04-19 BIOBANK H
15: 568 2012-04-19 PHYSICAL_exam <NA>
16: 568 2013-06-01 FOLLOW_UP <NA>
17: 568 2013-12-12 BIOBANK H
18: 568 2013-12-12 PHYSICAL_exam <NA>
19: 568 2014-04-27 FOLLOW_UP <NA>
20: 568 2014-05-22 RA_DIAG <NA>
21: 568 2014-05-28 PHYSICAL_exam <NA>
22: 568 2018-11-06 BIOBANK L
23: 568 2018-11-06 PHYSICAL_exam <NA>
24: 568 2018-11-27 FOLLOW_UP <NA>
25: 568 2019-06-20 FOLLOW_UP <NA>
subject_id index_date row_type acpa
例如,在第 3 行,acpa
=“N”。我想执行诸如应用之类的操作na.locf()
,但仅适用于对应的index_date
.
library(date.table)
library(zoo)
TABLE[, acpa := na.locf(acpa, na.rm = F), by = subject_id]
我怎么能建立一个rule
论据na.locf()
所以它只能填补我的时间窗口中的空白?
我想要的输出如下所示:
subject_id index_date row_type acpa
1: 155 2010-05-12 BASELINE_info N
2: 155 2010-05-12 BASELINE_info N
3: 155 2010-05-12 BIOBANK N
4: 155 2010-05-12 PHYSICAL_exam N
5: 155 2010-09-29 FOLLOW_UP N
6: 155 2011-11-30 FOLLOW_UP <NA>
7: 155 2013-06-01 FOLLOW_UP <NA>
8: 155 2014-06-01 FOLLOW_UP <NA>
9: 155 2015-06-01 FOLLOW_UP <NA>
10: 155 2016-08-31 FOLLOW_UP <NA>
11: 568 2012-04-07 BASELINE_info H
12: 568 2012-04-07 BASELINE_info H
13: 568 2012-04-07 FOLLOW_UP H
14: 568 2012-04-19 BIOBANK H
15: 568 2012-04-19 PHYSICAL_exam H
16: 568 2013-06-01 FOLLOW_UP H
17: 568 2013-12-12 BIOBANK H
18: 568 2013-12-12 PHYSICAL_exam H
19: 568 2014-04-27 FOLLOW_UP H
20: 568 2014-05-22 RA_DIAG H
21: 568 2014-05-28 PHYSICAL_exam H
22: 568 2018-11-06 BIOBANK L
23: 568 2018-11-06 PHYSICAL_exam L
24: 568 2018-11-27 FOLLOW_UP L
25: 568 2019-06-20 FOLLOW_UP L
subject_id index_date row_type acpa
如果由于时间窗口重叠而发生冲突,我会保留最后一个(之前的)非 NA 的值。
解决方案
我的数据经理同事提出了这个建议:
从前瞻性预测开始。
DT[!is.na(acpa), variable_t := index_date] # variable_t copied as the index date of the variable to impute
DT[, acpa_c := na.locf(acpa, na.rm = F), by = subject_id] # locf for the variable in a new variable
DT[, variable_t := na.locf(variable_t, na.rm = F), by = subject_id] # locf of the variable_time
DT[, delay := as.numeric(difftime(index_date, variable_t ,units = "days"))] # calculating on each row the delay
DT[delay > 365, acpa_c := NA] # cutting off values above my threshold
然后,向后投影也是如此。
DT[, acpa_c := na.locf(acpa_c , na.rm = F, fromLast = T),by = subject_id] ### na.lofb, in the corrected variable
DT[!is.na(acpa), variable_t := index_date] # reset of the temporary variable_time, so we can projet it backward
DT[, variable_t := na.locf(ratio_t, na.rm = F, fromLast = T),by = subject_id] # locb of the variable_time
DT[, delay := as.numeric(difftime(index_date, variable_t , units = "days"))] # delay
DT[delay < -180, acpa_c := NA] # cutting off values above my threshold
如果一切正常,则用新值替换。
DT[, acpa := acpa_c]
DT[, acpa_c := NULL]
推荐阅读
- ruby-on-rails - 消息:失败/错误:需要 File.expand_path('../../config/environment', __FILE__)
- javascript - 无法通过 Angular 组件中的 Promises 赋值
- angular - Angular MatTable - 视口内的中心可扩展行
- java - 模拟静态方法时出现内容类型错误
- excel - 将处理限制到某些单元格
- reactjs - 有人可以告诉我反应 Js 中子元素的这种循环实现有什么问题吗
- javascript - 如何修复缺少的 jsx 表达式容器围绕文字字符串?
- python - Django模型解耦
- c - 检查变量是否是某物的值的方法
- python - 我希望能够将一个 CSV 中的值与另一个 CSV 中的一组标称值进行比较