r - 有条件地将 NA 替换为其他行中的值
问题描述
我得到了一个大型数据集,其中一个变量中有一组相对较大的缺失变量值。但是由于我知道该变量取决于时间和空间方面,因此我可以通过从具有精确匹配的时间和空间值的另一行中获取值来轻松地估算缺失值。假设生成的数据如下:
temporal <- c("Monday", "Monday", "Tuesday", "Tuesday","Wednesday", "Wednesday", "Thursday", "Thursday", "Friday", "Friday","Monday", "Monday", "Tuesday", "Tuesday","Wednesday", "Wednesday", "Thursday", "Thursday", "Friday", "Friday")
spatial <- c("North", "South","North", "South","North", "South","North", "South","North", "South", "North", "South","North", "South","North", "South","North", "South","North", "South")
value <- c(NA,2,3,4,5,6,7,NA,9,10,1,NA,3,4,5,6,7,8,9,NA)
df <- as.data.frame(cbind(temporal, spatial, value))
这给出了以下数据框:
temporal spatial value
1 Monday North NA
2 Monday South 2
3 Tuesday North 3
4 Tuesday South 4
5 Wednesday North 5
6 Wednesday South 6
7 Thursday North 7
8 Thursday South NA
9 Friday North 9
10 Friday South 10
11 Monday North 1
12 Monday South NA
13 Tuesday North 3
14 Tuesday South 4
15 Wednesday North 5
16 Wednesday South 6
17 Thursday North 7
18 Thursday South 8
19 Friday North 9
20 Friday South NA
在这种情况下,我想将 替换为在和上具有匹配值value == NA
的value
另一行中的。spatial
temporal
因此,最终结果应如下所示:
temporal spatial value
1 Monday North 1
2 Monday South 2
3 Tuesday North 3
4 Tuesday South 4
5 Wednesday North 5
6 Wednesday South 6
7 Thursday North 7
8 Thursday South 8
9 Friday North 9
10 Friday South 10
11 Monday North 1
12 Monday South 2
13 Tuesday North 3
14 Tuesday South 4
15 Wednesday North 5
16 Wednesday South 6
17 Thursday North 7
18 Thursday South 8
19 Friday North 9
20 Friday South 10
我试图通过使用以下group_by
函数来做到这一点tidyverse
:
library(tidyverse)
df <- df %>%
group_by(temporal, spatial) %>%
mutate(value, unique(value[is.na(value)]))
但我收到以下错误消息:
Error: Problem with `mutate()` input `..2`.
x Input `..2` can't be recycled to size 2.
i Input `..2` is `unique(value[is.na(value)])`.
i Input `..2` must be size 2 or 1, not 0.
i The error occurred in group 1: temporal = "Friday", spatial = "North"
我是否以正确的方式处理这个问题?如果是,为什么我的代码不能像(我相信)它应该的那样工作?如果不是,什么方法是合适的?
谢谢!:)
解决方案
这是一个dplyr
方法。我们按temporal
和分组spatial
,然后按 和 排列,temporal
因为NA 值将自动置于任何非 NA 值之下。然后我们根据第一行的数字来创建。spatial
value
mutate
value
value
library(dplyr)
df %>%
group_by(temporal, spatial) %>%
arrange(temporal, spatial, value) %>%
mutate(value = value[1])
一个更简洁的方法tidyr::fill
,它保留了行的结构:
library(tidyverse)
df %>%
group_by(temporal, spatial) %>%
fill(value, .direction = "downup")
# A tibble: 20 x 3
# Groups: temporal, spatial [10]
temporal spatial value
<chr> <chr> <chr>
1 Monday North 1
2 Monday South 2
3 Tuesday North 3
4 Tuesday South 4
5 Wednesday North 5
6 Wednesday South 6
7 Thursday North 7
8 Thursday South 8
9 Friday North 9
10 Friday South 10
11 Monday North 1
12 Monday South 2
13 Tuesday North 3
14 Tuesday South 4
15 Wednesday North 5
16 Wednesday South 6
17 Thursday North 7
18 Thursday South 8
19 Friday North 9
20 Friday South 10
推荐阅读
- php - 如何在 PHP 中返回多个值?
- java - 没有名为 X 的 EntityManager 的持久性提供程序错误
- excel - 在上下文菜单中添加/删除许多命令非常慢,尤其是在大型工作表上
- python - Django - 使用 3rd 方 API 以及分页
- wordpress - Wordpress / Woocommerce 我的帐户子页面
- excel - 如何让 VBA 函数引用它旁边列上的单元格?
- spring-boot - 自定义 Spring 安全链未被拾取
- python - 按照 Pandas Dataframe 中的模式转换字符串列的内容
- python - Jupyter Notebooks Spark RDD 拆分功能——去掉括号
- java - 将 Android 应用程序连接到在 localhost 上运行的服务器