首页 > 解决方案 > 某个日期的出现次数超过 x 次,获取下一个可用日期

问题描述

我有一个包含 15 列的数据框,其中 1 列是参与者 ID,14 列是每位参与者可能的约会日期(节假日和周末除外):

  Included.Participant         V1         V2         V3         V4         V5   V6   V7         V8         V9        V10        V11        V12  V13  V14
1                    1 2021-03-22 2021-03-23 2021-03-24 2021-03-25 2021-03-26 <NA> <NA> 2021-03-29 2021-03-30 2021-03-31 2021-04-01 2021-04-02 <NA> <NA>
2                    2 2021-03-22 2021-03-23 2021-03-24 2021-03-25 2021-03-26 <NA> <NA> 2021-03-29 2021-03-30 2021-03-31 2021-04-01 2021-04-02 <NA> <NA>
3                    3 2021-03-22 2021-03-23 2021-03-24 2021-03-25 2021-03-26 <NA> <NA> 2021-03-29 2021-03-30 2021-03-31 2021-04-01 2021-04-02 <NA> <NA>
4                    4 2021-03-22 2021-03-23 2021-03-24 2021-03-25 2021-03-26 <NA> <NA> 2021-03-29 2021-03-30 2021-03-31 2021-04-01 2021-04-02 <NA> <NA>
5                    5 2021-03-22 2021-03-23 2021-03-24 2021-03-25 2021-03-26 <NA> <NA> 2021-03-29 2021-03-30 2021-03-31 2021-04-01 2021-04-02 <NA> <NA>
6                    6 2021-03-22 2021-03-23 2021-03-24 2021-03-25 2021-03-26 <NA> <NA> 2021-03-29 2021-03-30 2021-03-31 2021-04-01 2021-04-02 <NA> <NA>

每个日期,可以添加 3 名参与者。如果已达到最多 3 个,则应移至第四个参与者的下一个可用日期。所以在这个例子中,期望的输出是:

      Included.Participant         V1
1                    1 2021-03-22
2                    2 2021-03-22
3                    3 2021-03-22
4                    4 2021-03-23
5                    5 2021-03-23
6                    6 2021-03-23

如果没有可能的日期,则 V1 列可以留空。

我似乎无法弄清楚如何获得所需的输出。我真的希望你能帮忙

非常感谢!

输入:

structure(list(Included.y = c(1L, 2L, 3L, 4L, 7L, 8L, 9L, 10L, 
11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L), 
    V1 = structure(c(18870, NA, 18848, NA, NA, NA, NA, NA, 18806, 
    18799, 18835, 18841, NA, NA, 18912, 18954, NA, 18842, NA, 
    NA), class = "Date"), V2 = structure(c(18871, NA, 18849, 
    NA, NA, NA, NA, 18876, 18807, 18800, 18836, 18842, NA, NA, 
    18913, 18955, NA, 18843, NA, NA), class = "Date"), V3 = structure(c(18872, 
    18904, 18850, 18897, 18967, NA, 18883, 18877, 18808, 18801, 
    18837, 18843, 18890, NA, 18914, 18956, 18953, 18844, NA, 
    18869), class = "Date"), V4 = structure(c(NA, 18905, 18851, 
    18898, 18968, 18953, 18884, 18878, 18809, 18802, NA, 18844, 
    18891, 18967, NA, NA, 18954, NA, 18925, 18870), class = "Date"), 
    V5 = structure(c(NA, 18906, NA, 18899, 18969, 18954, 18885, 
    18879, NA, NA, NA, NA, 18892, 18968, NA, NA, 18955, NA, 18926, 
    18871), class = "Date"), V6 = structure(c(NA, 18907, NA, 
    18900, 18970, 18955, 18886, NA, NA, NA, NA, NA, 18893, 18969, 
    NA, NA, 18956, NA, 18927, 18872), class = "Date"), V7 = structure(c(18876, 
    NA, NA, NA, NA, 18956, NA, NA, NA, NA, 18841, NA, NA, 18970, 
    18918, 18960, NA, 18848, 18928, NA), class = "Date"), V8 = structure(c(18877, 
    NA, 18855, NA, NA, NA, NA, NA, 18813, 18806, 18842, 18848, 
    NA, NA, 18919, 18961, NA, 18849, NA, NA), class = "Date"), 
    V9 = structure(c(18878, NA, 18856, NA, NA, NA, NA, 18883, 
    18814, 18807, 18843, 18849, NA, NA, 18920, 18962, NA, 18850, 
    NA, NA), class = "Date"), V10 = structure(c(18879, 18911, 
    18857, 18904, 18974, NA, 18890, 18884, 18815, 18808, 18844, 
    18850, 18897, NA, 18921, 18963, 18960, 18851, NA, 18876), class = "Date"), 
    V11 = structure(c(NA, 18912, 18858, 18905, 18975, 18960, 
    18891, 18885, 18816, 18809, NA, 18851, 18898, 18974, NA, 
    NA, 18961, NA, 18932, 18877), class = "Date"), V12 = structure(c(NA, 
    18913, NA, 18906, 18976, 18961, 18892, 18886, NA, NA, NA, 
    NA, 18899, 18975, NA, NA, 18962, NA, 18933, 18878), class = "Date"), 
    V13 = structure(c(NA, 18914, NA, 18907, 18977, 18962, 18893, 
    NA, NA, NA, NA, NA, 18900, 18976, NA, NA, 18963, NA, 18934, 
    18879), class = "Date"), V14 = structure(c(18883, NA, NA, 
    NA, NA, 18963, NA, NA, NA, NA, 18848, NA, NA, 18977, 18925, 
    18967, NA, 18855, 18935, NA), class = "Date")), row.names = c(NA, 
20L), class = "data.frame")

标签: rdataframe

解决方案


在澄清参与者分配到不同列中的相同日期后更新了代码:

library(dplyr)
library(tidyr)
library(tibble)


df1 <- 
  df %>% 
  pivot_longer(-Included.Participant) %>% 
  select(-Included.Participant) %>% 
  mutate(name = factor(name, levels = paste0("V", 1:14), ordered = TRUE))%>% 
  group_by(value) %>%
  arrange(value, name) %>% 
  slice_head(n = 3)%>% 
  rowid_to_column(var = "Included.Participant") %>% 
  filter(Included.Participant <= 20) %>% 
  pivot_wider(names_from = name, values_from = value)

  • 输出
head(df1, 10)
#> # A tibble: 10 x 5
#>    Included.Participant V1         V2         V3         V4        
#>                   <int> <date>     <date>     <date>     <date>    
#>  1                    1 2021-03-22 NA         NA         NA        
#>  2                    2 2021-03-22 NA         NA         NA        
#>  3                    3 2021-03-22 NA         NA         NA        
#>  4                    4 2021-03-23 NA         NA         NA        
#>  5                    5 2021-03-23 NA         NA         NA        
#>  6                    6 2021-03-23 NA         NA         NA        
#>  7                    7 NA         2021-03-24 NA         NA        
#>  8                    8 NA         2021-03-24 NA         NA        
#>  9                    9 NA         2021-03-24 NA         NA        
#> 10                   10 NA         NA         2021-03-25 NA


推荐阅读