首页 > 解决方案 > 检查当前 (R) 前几天是否满足条件

问题描述

我有一个按日期排列的数据库表格,每小时测量一次。如果这个度量值大于 5,则满足标准 1(我已经在 dplyr 中使用了 mutate)像这样:

日期 小时 措施 符合_criteria_1
2020 年 1 月 1 日 00:00:00 7 1
2020 年 1 月 1 日 01:00:00 12 1
2020 年 1 月 1 日 02:00:00 3 0
…………
2020 年 10 月 1 日 21:00:00 2 0
2020 年 10 月 1 日 22:00:00 15 1
2020 年 10 月 1 日 23:00:00 20 1

现在,我想知道一天中有多少次满足条件。为此,我按天对“Meets_criteria_1”分组中的所有“1”求和:

日期 小时 符合_criteria_1 Sum_criteria_1
2020 年 1 月 1 日 00:00:00 1 3
2020 年 1 月 1 日 01:00:00 1 3
2020 年 1 月 1 日 02:00:00 0 3
…………
2020 年 10 月 1 日 21:00:00 0 11
2020 年 10 月 1 日 22:00:00 1 11
2020 年 10 月 1 日 23:00:00 1 11

但是,我需要第三个条件来满足标准 2,即“在前两天,“Sum_criteria_1”至少为 6”

我的问题是:我如何告诉 R 检查前两天到当前?因此,例如,如果我正在查看日期 08/01/2020,则仅当 sum_criteria_1 在 2020 年 6 月 1 日和 2020 年 7 月 1 日至少为 6 时才满足条件。

编辑:我尝试使用滞后,但我只是得到 NA:

mutate(yesterday = lag(sum_criteria_1, n=1), 
       day_before = lag(sum_criteria_1, n = 2))

任何帮助都将受到欢迎!谢谢

标签: rdateconditional-statements

解决方案


使用这样的东西

df %>% mutate(meets_criteria_1 = ifelse(Measure > 5, 1, 0)) %>%
  group_by(Date) %>%
  summarise(sum_criteria = sum(meets_criteria_1)) %>%
  mutate(criteria2 = ifelse(lag(sum_criteria) >= 6 & lag(sum_criteria,2), 1, 0)) %>%
  right_join(df, by = "Date")

样本数据

df <- structure(list(Date = c("01-01-2020", "01-01-2020", "01-01-2020", 
"01-01-2020", "01-01-2020", "01-01-2020", "01-01-2020", "01-01-2020", 
"01-01-2020", "01-01-2020", "01-01-2020", "01-01-2020", "01-01-2020", 
"01-01-2020", "01-01-2020", "01-01-2020", "01-01-2020", "01-01-2020", 
"01-01-2020", "01-01-2020", "01-01-2020", "01-01-2020", "01-01-2020", 
"01-01-2020", "02-01-2020", "02-01-2020", "02-01-2020", "02-01-2020", 
"02-01-2020", "02-01-2020", "02-01-2020", "02-01-2020", "02-01-2020", 
"02-01-2020", "02-01-2020", "02-01-2020", "02-01-2020", "02-01-2020", 
"02-01-2020", "02-01-2020", "02-01-2020", "02-01-2020", "02-01-2020", 
"02-01-2020", "02-01-2020", "02-01-2020", "02-01-2020", "02-01-2020", 
"03-01-2020", "03-01-2020", "03-01-2020", "03-01-2020", "03-01-2020", 
"03-01-2020", "03-01-2020", "03-01-2020", "03-01-2020", "03-01-2020", 
"03-01-2020", "03-01-2020", "03-01-2020", "03-01-2020", "03-01-2020", 
"03-01-2020", "03-01-2020", "03-01-2020", "03-01-2020", "03-01-2020", 
"03-01-2020", "03-01-2020", "03-01-2020", "03-01-2020", "04-01-2020", 
"04-01-2020", "04-01-2020", "04-01-2020", "04-01-2020", "04-01-2020", 
"04-01-2020", "04-01-2020", "04-01-2020", "04-01-2020", "04-01-2020", 
"04-01-2020", "04-01-2020", "04-01-2020", "04-01-2020", "04-01-2020", 
"04-01-2020", "04-01-2020", "04-01-2020", "04-01-2020", "04-01-2020", 
"04-01-2020", "04-01-2020", "04-01-2020", "05-01-2020", "05-01-2020", 
"05-01-2020", "05-01-2020", "05-01-2020", "05-01-2020", "05-01-2020", 
"05-01-2020", "05-01-2020", "05-01-2020", "05-01-2020", "05-01-2020", 
"05-01-2020", "05-01-2020", "05-01-2020", "05-01-2020", "05-01-2020", 
"05-01-2020", "05-01-2020", "05-01-2020", "05-01-2020", "05-01-2020", 
"05-01-2020", "05-01-2020"), Hour = c("00:00:00", "01:00:00", 
"02:00:00", "03:00:00", "04:00:00", "05:00:00", "06:00:00", "07:00:00", 
"08:00:00", "09:00:00", "10:00:00", "11:00:00", "12:00:00", "13:00:00", 
"14:00:00", "15:00:00", "16:00:00", "17:00:00", "18:00:00", "19:00:00", 
"20:00:00", "21:00:00", "22:00:00", "23:00:00", "00:00:00", "01:00:00", 
"02:00:00", "03:00:00", "04:00:00", "05:00:00", "06:00:00", "07:00:00", 
"08:00:00", "09:00:00", "10:00:00", "11:00:00", "12:00:00", "13:00:00", 
"14:00:00", "15:00:00", "16:00:00", "17:00:00", "18:00:00", "19:00:00", 
"20:00:00", "21:00:00", "22:00:00", "23:00:00", "00:00:00", "01:00:00", 
"02:00:00", "03:00:00", "04:00:00", "05:00:00", "06:00:00", "07:00:00", 
"08:00:00", "09:00:00", "10:00:00", "11:00:00", "12:00:00", "13:00:00", 
"14:00:00", "15:00:00", "16:00:00", "17:00:00", "18:00:00", "19:00:00", 
"20:00:00", "21:00:00", "22:00:00", "23:00:00", "00:00:00", "01:00:00", 
"02:00:00", "03:00:00", "04:00:00", "05:00:00", "06:00:00", "07:00:00", 
"08:00:00", "09:00:00", "10:00:00", "11:00:00", "12:00:00", "13:00:00", 
"14:00:00", "15:00:00", "16:00:00", "17:00:00", "18:00:00", "19:00:00", 
"20:00:00", "21:00:00", "22:00:00", "23:00:00", "00:00:00", "01:00:00", 
"02:00:00", "03:00:00", "04:00:00", "05:00:00", "06:00:00", "07:00:00", 
"08:00:00", "09:00:00", "10:00:00", "11:00:00", "12:00:00", "13:00:00", 
"14:00:00", "15:00:00", "16:00:00", "17:00:00", "18:00:00", "19:00:00", 
"20:00:00", "21:00:00", "22:00:00", "23:00:00"), Measure = c(2L, 
14L, 4L, 19L, 0L, 15L, 13L, 17L, 3L, 19L, 0L, 17L, 15L, 14L, 
8L, 7L, 13L, 14L, 4L, 18L, 18L, 14L, 8L, 1L, 3L, 12L, 18L, 7L, 
13L, 15L, 12L, 17L, 2L, 8L, 1L, 18L, 19L, 14L, 2L, 7L, 12L, 17L, 
14L, 20L, 1L, 15L, 18L, 1L, 12L, 5L, 0L, 20L, 19L, 10L, 7L, 5L, 
8L, 8L, 0L, 15L, 16L, 20L, 14L, 18L, 17L, 3L, 15L, 14L, 4L, 17L, 
16L, 11L, 12L, 10L, 7L, 0L, 15L, 3L, 12L, 17L, 6L, 4L, 16L, 4L, 
15L, 0L, 5L, 7L, 6L, 3L, 15L, 10L, 12L, 19L, 13L, 3L, 18L, 14L, 
11L, 18L, 15L, 17L, 19L, 1L, 18L, 16L, 14L, 2L, 3L, 2L, 16L, 
10L, 2L, 12L, 10L, 7L, 5L, 9L, 12L, 17L)), class = "data.frame", row.names = c(NA, 
-120L))

检查结果

# A tibble: 120 x 5
   Date       sum_criteria criteria2 Hour     Measure
   <chr>             <dbl>     <dbl> <chr>      <int>
 1 01-01-2020           17        NA 00:00:00       2
 2 01-01-2020           17        NA 01:00:00      14
 3 01-01-2020           17        NA 02:00:00       4
 4 01-01-2020           17        NA 03:00:00      19
 5 01-01-2020           17        NA 04:00:00       0
 6 01-01-2020           17        NA 05:00:00      15
 7 01-01-2020           17        NA 06:00:00      13
 8 01-01-2020           17        NA 07:00:00      17
 9 01-01-2020           17        NA 08:00:00       3
10 01-01-2020           17        NA 09:00:00      19
# ... with 110 more rows

推荐阅读