r - 如何用组内以前的非 NaN 替换 NaN 值
问题描述
我需要用组中以前的非 NaN 值替换 NaN 值。
这是一个例子:
+-------+------------+-------+
| ts_id | date | value |
+-------+------------+-------+
| 2 | 01/10/2014 | 18 |
| 2 | 01/11/2014 | 15 |
| 2 | 01/12/2014 | NaN |
| 2 | 01/01/2015 | NaN |
| 2 | 01/02/2015 | NaN |
| 3 | 01/03/2015 | 19 |
| 3 | 01/04/2015 | 20 |
| 3 | 01/10/2015 | 12 |
| 3 | 01/11/2015 | 17 |
| 3 | 01/12/2015 | NaN |
| 3 | 01/01/2016 | NaN |
| 3 | 01/08/2016 | 7 |
| 3 | 01/09/2016 | NaN |
| 3 | 01/10/2016 | NaN |
| 3 | 01/11/2016 | NaN |
| 3 | 01/12/2016 | NaN |
| 3 | 01/01/2017 | NaN |
+-------+------------+-------+
数据:
data <- structure(list(ts_id = c(2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3), date = structure(c(16344, 16375, 16405, 16436,
16467, 16495, 16526, 16709, 16740, 16770, 16801, 17014, 17045,
17075, 17106, 17136, 17167), class = "Date"), value = c(18, 15,
NaN, NaN, NaN, 19, 20, 12, 17, NaN, NaN, 7, NaN, NaN, NaN, NaN,
NaN)), row.names = c(NA, -17L), vars = "ts_id", drop = TRUE, indices = list(
0:16), group_sizes = 17L, biggest_group_size = 17L, labels = structure(list(
ts_id = 3L), row.names = c(NA, -1L), class = "data.frame", vars = "ts_id", drop = TRUE), class = "data.frame")
在每个组中(由 ts_id 标识),我可以在任何给定日期拥有 NaN 值。我需要用最新的非 NaN 值替换每个 NaN。
结果应如下所示:
+-------+------------+-------+
| ts_id | date | value |
+-------+------------+-------+
| 2 | 01/10/2014 | 18 |
| 2 | 01/11/2014 | 15 |
| 2 | 01/12/2014 | 15 |
| 2 | 01/01/2015 | 15 |
| 2 | 01/02/2015 | 15 |
| 3 | 01/03/2015 | 19 |
| 3 | 01/04/2015 | 20 |
| 3 | 01/10/2015 | 12 |
| 3 | 01/11/2015 | 17 |
| 3 | 01/12/2015 | 17 |
| 3 | 01/01/2016 | 17 |
| 3 | 01/08/2016 | 7 |
| 3 | 01/09/2016 | 7 |
| 3 | 01/10/2016 | 7 |
| 3 | 01/11/2016 | 7 |
| 3 | 01/12/2016 | 7 |
| 3 | 01/01/2017 | 7 |
+-------+------------+-------+
提前致谢。
解决方案
你可以使用这个:
library(dplyr)
library(zoo) # for the na.locf function
data %>%
group_by(ts_id) %>% # group by id
mutate(value = na.locf(value,na.rm=F)) # na.locf fills with the last non-empty value
#head()
# # A tibble: 6 x 3
# # Groups: ts_id [2]
# ts_id date value
# <dbl> <date> <dbl>
# 1 2 2014-10-01 18
# 2 2 2014-11-01 15
# 3 2 2014-12-01 15
# 4 2 2015-01-01 15
# 5 2 2015-02-01 15
# 6 3 2015-03-01 19
推荐阅读
- r - brglmFit 对象的 Anova 表的负平方和
- python - 无法在函数中跳出 for 循环?
- android - instamojo 抛出错误:无法解决:in.juspay:godel:0.6.25.4.1423 for sdk 28
- r - 如何在 R 中以日期和时间为 x 轴进行绘图
- node.js - Microsoft Teams - 读出频道消息(ReactJS、NodeJS)
- python - 在评估类时,python 中是否有一种神奇的方法?
- c# - 与项目内的属性字段一起使用时,LinQ OrderBy 返回 System.NullReferenceException
- c# - XsltCompiledTransform 与 C# 脚本在 Azure 平台上慢
- angular - 角度 SSR 在动画元素上闪烁
- grep - GREP:查找超过 12 个字符的行,不包括空格