首页 > 解决方案 > 有条件地将 NA 替换为其他行中的值

问题描述

我得到了一个大型数据集,其中一个变量中有一组相对较大的缺失变量值。但是由于我知道该变量取决于时间和空间方面,因此我可以通过从具有精确匹配的时间和空间值的另一行中获取值来轻松地估算缺失值。假设生成的数据如下:

temporal <- c("Monday", "Monday", "Tuesday", "Tuesday","Wednesday", "Wednesday", "Thursday", "Thursday", "Friday", "Friday","Monday", "Monday", "Tuesday", "Tuesday","Wednesday", "Wednesday", "Thursday", "Thursday", "Friday", "Friday")
spatial <- c("North", "South","North", "South","North", "South","North", "South","North", "South", "North", "South","North", "South","North", "South","North", "South","North", "South")
value <- c(NA,2,3,4,5,6,7,NA,9,10,1,NA,3,4,5,6,7,8,9,NA)

df <- as.data.frame(cbind(temporal, spatial, value))

这给出了以下数据框:

    temporal spatial value
1     Monday   North    NA
2     Monday   South     2
3    Tuesday   North     3
4    Tuesday   South     4
5  Wednesday   North     5
6  Wednesday   South     6
7   Thursday   North     7
8   Thursday   South    NA
9     Friday   North     9
10    Friday   South    10
11    Monday   North     1
12    Monday   South    NA
13   Tuesday   North     3
14   Tuesday   South     4
15 Wednesday   North     5
16 Wednesday   South     6
17  Thursday   North     7
18  Thursday   South     8
19    Friday   North     9
20    Friday   South    NA

在这种情况下,我想将 替换为在和上具有匹配值value == NAvalue另一行中的。spatialtemporal

因此,最终结果应如下所示:

    temporal spatial value
1     Monday   North     1
2     Monday   South     2
3    Tuesday   North     3
4    Tuesday   South     4
5  Wednesday   North     5
6  Wednesday   South     6
7   Thursday   North     7
8   Thursday   South     8
9     Friday   North     9
10    Friday   South    10
11    Monday   North     1
12    Monday   South     2
13   Tuesday   North     3
14   Tuesday   South     4
15 Wednesday   North     5
16 Wednesday   South     6
17  Thursday   North     7
18  Thursday   South     8
19    Friday   North     9
20    Friday   South    10

我试图通过使用以下group_by函数来做到这一点tidyverse

library(tidyverse)
df <- df %>%
  group_by(temporal, spatial) %>%
  mutate(value, unique(value[is.na(value)]))

但我收到以下错误消息:

Error: Problem with `mutate()` input `..2`.
x Input `..2` can't be recycled to size 2.
i Input `..2` is `unique(value[is.na(value)])`.
i Input `..2` must be size 2 or 1, not 0.
i The error occurred in group 1: temporal = "Friday", spatial = "North"

我是否以正确的方式处理这个问题?如果是,为什么我的代码不能像(我相信)它应该的那样工作?如果不是,什么方法是合适的?

谢谢!:)

标签: rtidyverse

解决方案


这是一个dplyr方法。我们按temporal和分组spatial,然后按 和 排列,temporal因为NA 值将自动置于任何非 NA 值之下。然后我们根据第一行的数字来创建。spatialvaluemutatevaluevalue

library(dplyr)

df %>%
  group_by(temporal, spatial) %>% 
  arrange(temporal, spatial, value) %>% 
  mutate(value = value[1])

一个更简洁的方法tidyr::fill,它保留了行的结构:

library(tidyverse)

df %>%
  group_by(temporal, spatial) %>% 
  fill(value, .direction = "downup")

# A tibble: 20 x 3
# Groups:   temporal, spatial [10]
   temporal  spatial value
   <chr>     <chr>   <chr>
 1 Monday    North   1    
 2 Monday    South   2    
 3 Tuesday   North   3    
 4 Tuesday   South   4    
 5 Wednesday North   5    
 6 Wednesday South   6    
 7 Thursday  North   7    
 8 Thursday  South   8    
 9 Friday    North   9    
10 Friday    South   10   
11 Monday    North   1    
12 Monday    South   2    
13 Tuesday   North   3    
14 Tuesday   South   4    
15 Wednesday North   5    
16 Wednesday South   6    
17 Thursday  North   7    
18 Thursday  South   8    
19 Friday    North   9    
20 Friday    South   10   

推荐阅读