首页 > 解决方案 > if_else 用于删除每个组中的第一行 - dplyr

问题描述

Hers 是用于测试的 DF:

test_df <- structure(list(plant_id = c("plant_1", "plant_1", "plant_1", "plant_1", "plant_1",
                                       "plant_2", "plant_2", "plant_2", "plant_2", "plant_2", 
                                       "plant_3", "plant_3", "plant_3", "plant_3", "plant_3",
                                       "plant_4", "plant_4", "plant_4", "plant_4", "plant_4"), 
                          skipped = c(1, 1, 0, 1, 2, 
                                      0, 1, 1, 0, 2,
                                      1, 0, 1, 2, 2, 
                                      0, 0, 1, 1, 2)), 
                     row.names = c(NA, -20L), class = "data.frame", 
                     .Names = c("plant_sp", "skipped"))

如您所见,已跳过的变量具有值“0”、“1”或“2”。我需要每个 plant_id (这是 group_by 变量),当它的第一行在跳过的列处为“1”时,行将被删除,直到跳过的列发生变化。

例如在我的 DF 中:

   plant_sp skipped
1   plant_1       1
2   plant_1       1
3   plant_1       0
4   plant_1       1
5   plant_1       2
6   plant_2       0
7   plant_2       1
8   plant_2       1
9   plant_2       0
10  plant_2       2
11  plant_3       1
12  plant_3       0
13  plant_3       1
14  plant_3       2
15  plant_3       2
16  plant_4       0
17  plant_4       0
18  plant_4       1
19  plant_4       1
20  plant_4       2

到:

   plant_sp skipped
   plant_sp skipped

3   plant_1       0
4   plant_1       1
5   plant_1       2
6   plant_2       0
7   plant_2       1
8   plant_2       1
9   plant_2       0
10  plant_2       2
12  plant_3       0
13  plant_3       1
14  plant_3       2
15  plant_3       2
16  plant_4       0
17  plant_4       0
18  plant_4       1
19  plant_4       1
20  plant_4       2

如您所见,由于组“planet_1”和组“planet_2”以“1”开头,因此在开始时跳过的变量处带有“1”的所有行都被删除(第 1 行和第 2 行)。所有其他行保持原样。

如果可能的话,dplyr 解决方案会很棒,非常感谢!!!

标签: rdplyrtidyverse

解决方案


更新:原始版本不满足查询,它现在已更新为仅删除包含非一条目之前的一的行。这与查询中显示的输出相匹配。

为了尝试完成此操作,我创建了一些临时行来识别每个组中不包含一个的第一行,然后在此之前删除所有行

library(tidyverse)

test_df <- structure(list(plant_id = c("plant_1", "plant_1", "plant_1", "plant_1", "plant_1",
                                       "plant_2", "plant_2", "plant_2", "plant_2", "plant_2", 
                                       "plant_3", "plant_3", "plant_3", "plant_3", "plant_3",
                                       "plant_4", "plant_4", "plant_4", "plant_4", "plant_4"), 
                          skipped = c(1, 1, 0, 1, 2, 
                                      0, 1, 1, 0, 2,
                                      1, 0, 1, 2, 2, 
                                      0, 0, 1, 1, 2)), 
                     row.names = c(NA, -20L), class = "data.frame", 
                     .Names = c("plant_sp", "skipped"))

test_df <- tibble(test_df)

first_positions_df<- test_df %>% 
  # group by each factor we want
  group_by(plant_sp) %>%
  # label the order of the rows
  mutate(order = 1:length(skipped)) %>% 
  # mark position of rows which aren't a 1 otherwise set to infinity
  mutate(notones = ifelse(skipped != 1, order, Inf)) %>% 
  # Find the first position which is not a 1
  mutate(ignore = min(notones)) %>% 
  # Remove all ones before this row
  filter(ignore <= order)

#Final result
first_positions_df %>% 
  # select only the useful columns
  select(plant_sp, skipped)
#> # A tibble: 17 x 2
#> # Groups:   plant_sp [4]
#>    plant_sp skipped
#>    <chr>      <dbl>
#>  1 plant_1        0
#>  2 plant_1        1
#>  3 plant_1        2
#>  4 plant_2        0
#>  5 plant_2        1
#>  6 plant_2        1
#>  7 plant_2        0
#>  8 plant_2        2
#>  9 plant_3        0
#> 10 plant_3        1
#> 11 plant_3        2
#> 12 plant_3        2
#> 13 plant_4        0
#> 14 plant_4        0
#> 15 plant_4        1
#> 16 plant_4        1
#> 17 plant_4        2

reprex 包于 2021-04-04 创建 (v2.0.0 )


推荐阅读