首页 > 解决方案 > 根据存在记录创建存在/缺席列

问题描述

我有一个来自 59 个分类群的 1763 个样本的存在记录数据框。这是一个分类单元和样本较少的示例数据框:

            sample site family n_days
1    A_1_17/06/12   A1      X      3
2    A_1_17/06/12   A1      Y      3
3    A_1_17/06/12   A1      Z      3
4  A_3_02/11/2011   A3      X      5
5  A_3_02/11/2011   A3      V      5
6  A_3_02/11/2011   A3      W      5
7  A_1_22/02/2011   A1      X      3
8  A_1_22/02/2011   A1      V      3
9  A_1_22/02/2011   A1      Z      3
10 A_3_19/11/2011   A3      U      3
11 A_3_19/11/2011   A3      Y      3
12 A_3_19/11/2011   A3      Z      3

我想要的是创建一个占用列,如果样本中存在分类单元,则为 1,如果样本中不存在分类单元,则为 0。这是一个示例输出:

           sample site n_days family occupancy
1    A_1_17/06/12   A1      3      X         1
2    A_1_17/06/12   A1      3      Y         1
3    A_1_17/06/12   A1      3      Z         1
4    A_1_17/06/12   A1      3      V         0
5    A_1_17/06/12   A1      3      W         0
6    A_1_17/06/12   A1      3      U         0
7  A_3_02/11/2011   A3      5      X         1
8  A_3_02/11/2011   A3      5      V         1
9  A_3_02/11/2011   A3      5      W         1
10 A_3_02/11/2011   A3      5      Y         0
11 A_3_02/11/2011   A3      5      Z         0
12 A_3_02/11/2011   A3      5      U         0
13 A_1_22/02/2011   A1      3      X         1
14 A_1_22/02/2011   A1      3      V         1
15 A_1_22/02/2011   A1      3      Z         1
16 A_1_22/02/2011   A1      3      Y         0
17 A_1_22/02/2011   A1      3      W         0
18 A_1_22/02/2011   A1      3      U         0
19 A_3_19/11/2011   A3      3      U         1
20 A_3_19/11/2011   A3      3      Y         1
21 A_3_19/11/2011   A3      3      Z         1
22 A_3_19/11/2011   A3      3      X         0
23 A_3_19/11/2011   A3      3      V         0
24 A_3_19/11/2011   A3      3      W         0
    

任何建议,将不胜感激。

标签: rdplyr

解决方案


创建一个occupancy值为 1 的列,并用于complete创建组合并fill填充缺失值。

library(dplyr)
library(tidyr)

df %>%
  mutate(occupancy = 1) %>%
  complete(sample, family, fill = list(occupancy = 0)) %>%
  group_by(sample) %>%
  fill(site, n_days, .direction = 'updown') %>%
  ungroup 

#   sample         family site  n_days occupancy
#   <chr>          <chr>  <chr>  <int>     <dbl>
# 1 A_1_17/06/12   U      A1         3         0
# 2 A_1_17/06/12   V      A1         3         0
# 3 A_1_17/06/12   W      A1         3         0
# 4 A_1_17/06/12   X      A1         3         1
# 5 A_1_17/06/12   Y      A1         3         1
# 6 A_1_17/06/12   Z      A1         3         1
# 7 A_1_22/02/2011 U      A1         3         0
# 8 A_1_22/02/2011 V      A1         3         1
# 9 A_1_22/02/2011 W      A1         3         0
#10 A_1_22/02/2011 X      A1         3         1
# … with 14 more rows

推荐阅读