首页 > 解决方案 > 根据连续日期增加计数器

问题描述

我有一个包含多个站点、日期和流量的洪水表,我正在尝试确定每次洪水的日期。洪水可能持续几天,所以我想创建一个 ID 来对洪水进行编号,当日期与上一行连续时,我会有相同的数字,如果不连续,ID 将是前一个 + 1 . 该列Diffdays计算当前行与上一行之间的天数。

我的数据如下所示:

Station      Date            Diffdays   Discharge  Desired counter
Y6042010     1926-11-19      NA         89         1
Y6042010     1928-10-22      703        100        2
Y6042010     1928-10-23      1          115        2
W2022030     2000-04-03      NA         12         3
W2022030     2000-04-04      1          16         3
W2022030     2001-11-13      588        14         4

有什么想法或建议吗?我尝试了几件事,但找不到有用的东西。下面的代码将是我认为最接近的代码,但出现错误:

Flood <- Flood %>%
  group_by(Station) %>%
  mutate(Flood_counter = ifelse(Diffdays != 1, (Flood_counter - 1) + 1, Flood_counter - 1))

标签: rdplyr

解决方案


使用当前日期不是前一天后一天(或没有前一天)的条件构造一个逻辑向量。然后对向量进行累积和。

tibble(Date = as.Date(c("1926-11-19", "1928-10-22", "1928-10-23", 
                "2000-04-03", "2000-04-04", "2001-11-13"))) %>%
    mutate(
        first_one = seq_along(Date) == 1,
        sequential = Date == dplyr::lag(Date) + 1,
        id = cumsum(first_one | !sequential)
    )

结果

# A tibble: 6 x 3
  Date       sequential    id
  <date>     <lgl>      <dbl>
1 1926-11-19 TRUE           1
2 1928-10-22 FALSE          2
3 1928-10-23 TRUE           2
4 2000-04-03 FALSE          3
5 2000-04-04 TRUE           3
6 2001-11-13 FALSE          4

编辑:包括 agroup_by将导致后续mutate将每个行子集视为单独的数据集,重新开始每个组的计数。有关详细信息,请参阅 R for Data Science ( https://r4ds.had.co.nz/transform.html#grouped-mutates-and-filters )

tibble(Date = as.Date(c("1926-11-19", "1928-10-22", "1928-10-23", 
                    "2000-04-03", "2000-04-04", "2001-11-13")),
   station = c("A", "A", "A", "B", "B", "B")) %>%
group_by(station) %>%
mutate(
    first_one = seq_along(Date) == 1,
    sequential = Date == dplyr::lag(Date) + 1,
    flood_counter = cumsum(first_one | !sequential)
) 

结果

# A tibble: 6 x 5
# Groups:   station [2]
  Date       station first_one sequential flood_counter
  <date>     <chr>   <lgl>     <lgl>              <int>
1 1926-11-19 A       TRUE      NA                     1
2 1928-10-22 A       FALSE     FALSE                  2
3 1928-10-23 A       FALSE     TRUE                   2
4 2000-04-03 B       TRUE      NA                     1
5 2000-04-04 B       FALSE     TRUE                   1
6 2001-11-13 B       FALSE     FALSE                  2

推荐阅读