r - 如何将 NA 填充到 R 中的下一行?
问题描述
我想将 NA 填写到下一行。这是数据集。
结构(列表(时间戳=结构(c(1L,2L,3L,4L,5L,6L,7L,8L,9L,10L,11L,1L,2L,3L,4L,5L,6L,7L,8L,9L, 10L, 11L), .Label = c("2019-07-07 00:00:00", "2019-07-07 00:00:01", "2019-07-07 00:00:02", " 2019-07-07 00:00:03”、“2019-07-07 00:00:04”、“2019-07-07 00:00:05”、“2019-07-07 00:00:06” , "2019-07-07 00:00:07", "2019-07-07 00:00:08", "2019-07-07 00:00:09", "2019-07-07 00:00: 10"), 类 = "因子"), 源 = 结构 (c(NA, NA, NA, 1L, NA, NA, 1L, NA, NA, NA, NA, NA, 2L, NA, 2L, NA, NA , 2L, NA, NA, 2L, NA), .Label = c("USER_A", "USER_B"), class = "factor"), value = c(NA, NA, NA, 1L, NA, NA, 1L ,NA,NA,NA,NA,NA,1L,NA,1L,NA,NA,2L,NA,NA,3L,NA)),类 = “data.frame”, row.names = c(NA, -22L))
timestamp source value
1 2019-07-07 00:00:00 <NA> NA
2 2019-07-07 00:00:01 <NA> NA
3 2019-07-07 00:00:02 <NA> NA
4 2019-07-07 00:00:03 USER_A 1
5 2019-07-07 00:00:04 <NA> NA
6 2019-07-07 00:00:05 <NA> NA
7 2019-07-07 00:00:06 USER_A 1
8 2019-07-07 00:00:07 <NA> NA
9 2019-07-07 00:00:08 <NA> NA
10 2019-07-07 00:00:09 <NA> NA
11 2019-07-07 00:00:10 <NA> NA
12 2019-07-07 00:00:00 <NA> NA
13 2019-07-07 00:00:01 USER_B 1
14 2019-07-07 00:00:02 <NA> NA
15 2019-07-07 00:00:03 USER_B 1
16 2019-07-07 00:00:04 <NA> NA
17 2019-07-07 00:00:05 <NA> NA
18 2019-07-07 00:00:06 USER_B 2
19 2019-07-07 00:00:07 <NA> NA
20 2019-07-07 00:00:08 <NA> NA
21 2019-07-07 00:00:09 USER_B 3
22 2019-07-07 00:00:10 <NA> NA
该表是时间和源之间的循环。每个源(A 和 B)都有固定的行(在这种情况下为 00:00:00 到 00:00:10)。
这是预期的结果表。
timestamp source value
1 2019-07-07 00:00:00 <NA> NA
2 2019-07-07 00:00:01 <NA> NA
3 2019-07-07 00:00:02 <NA> NA
4 2019-07-07 00:00:03 USER_A 1
5 2019-07-07 00:00:04 USER_A 1
6 2019-07-07 00:00:05 USER_A 1
7 2019-07-07 00:00:06 USER_A 1
8 2019-07-07 00:00:07 <NA> NA
9 2019-07-07 00:00:08 <NA> NA
10 2019-07-07 00:00:09 <NA> NA
11 2019-07-07 00:00:10 <NA> NA
12 2019-07-07 00:00:00 <NA> NA
13 2019-07-07 00:00:01 USER_B 1
14 2019-07-07 00:00:02 USER_B 1
15 2019-07-07 00:00:03 USER_B 1
16 2019-07-07 00:00:04 USER_B 2
17 2019-07-07 00:00:05 USER_B 2
18 2019-07-07 00:00:06 USER_B 2
19 2019-07-07 00:00:07 USER_B 3
20 2019-07-07 00:00:08 USER_B 3
21 2019-07-07 00:00:09 USER_B 3
22 2019-07-07 00:00:10 <NA> NA
第 5 行和第 6 行的值和来源被替换为基于 USER_A 的第 7 行的值和来源。USER_B 行也被替换为基于下一行的相同方式。
如何在 R 中进行此过程?
解决方案
这是一种使用方法,dplyr
因为每个source
. 我们首先为每一n
行创建一个组,并添加一个新列group2
,该列之间只有 1min
和max
组中非 NA 值的索引。然后group_by
group2
,我们还fill
按组通过先前的非缺失值来查找缺失值。
n <- 11
library(dplyr)
df %>%
group_by(group1 = gl(n()/n, n)) %>%
mutate(group2 = 0,
group2 = replace(group2, min(which(!is.na(source))) :
max(which(!is.na(source))), 1)) %>%
group_by(group2) %>%
tidyr::fill(source, value) %>%
ungroup() %>%
select(-group1, -group2)
# A tibble: 22 x 3
# timestamp source value
# <fct> <fct> <int>
# 1 2019-07-07 00:00:00 NA NA
# 2 2019-07-07 00:00:01 NA NA
# 3 2019-07-07 00:00:02 NA NA
# 4 2019-07-07 00:00:03 USER_A 1
# 5 2019-07-07 00:00:04 USER_A 1
# 6 2019-07-07 00:00:05 USER_A 1
# 7 2019-07-07 00:00:06 USER_A 1
# 8 2019-07-07 00:00:07 NA NA
# 9 2019-07-07 00:00:08 NA NA
#10 2019-07-07 00:00:09 NA NA
# … with 12 more rows
推荐阅读
- ios - SQLite 查询语句插入在 swift 中无法正常工作
- python - 下面代码中的函数调用如何反转字符串?
- python - Django 项目的主题选择选项无法正常工作
- java - 指定为非 null 的参数是 null.SocketService
- mongodb - 如何使用与 V2.4 的 maxTimeMS 等效的函数来增加 MongoDB 查询的超时时间?
- python - List[Optional[int]] 分配时的类型检查
- mysql - codeigniter 的情况下在哪里不工作?
- python - sys.std.readline() 与。输入()
- reactjs - TypeError:无法读取反应挂钩中未定义的属性“地图”
- javascript - 如何将此类示例对象数组转换为字符串并将其还原?