r - 检查当前 (R) 前几天是否满足条件
问题描述
我有一个按日期排列的数据库表格,每小时测量一次。如果这个度量值大于 5,则满足标准 1(我已经在 dplyr 中使用了 mutate)像这样:
日期 | 小时 | 措施 | 符合_criteria_1 |
---|---|---|---|
2020 年 1 月 1 日 | 00:00:00 | 7 | 1 |
2020 年 1 月 1 日 | 01:00:00 | 12 | 1 |
2020 年 1 月 1 日 | 02:00:00 | 3 | 0 |
………… | |||
2020 年 10 月 1 日 | 21:00:00 | 2 | 0 |
2020 年 10 月 1 日 | 22:00:00 | 15 | 1 |
2020 年 10 月 1 日 | 23:00:00 | 20 | 1 |
现在,我想知道一天中有多少次满足条件。为此,我按天对“Meets_criteria_1”分组中的所有“1”求和:
日期 | 小时 | 符合_criteria_1 | Sum_criteria_1 |
---|---|---|---|
2020 年 1 月 1 日 | 00:00:00 | 1 | 3 |
2020 年 1 月 1 日 | 01:00:00 | 1 | 3 |
2020 年 1 月 1 日 | 02:00:00 | 0 | 3 |
………… | |||
2020 年 10 月 1 日 | 21:00:00 | 0 | 11 |
2020 年 10 月 1 日 | 22:00:00 | 1 | 11 |
2020 年 10 月 1 日 | 23:00:00 | 1 | 11 |
但是,我需要第三个条件来满足标准 2,即“在前两天,“Sum_criteria_1”至少为 6”
我的问题是:我如何告诉 R 检查前两天到当前?因此,例如,如果我正在查看日期 08/01/2020,则仅当 sum_criteria_1 在 2020 年 6 月 1 日和 2020 年 7 月 1 日至少为 6 时才满足条件。
编辑:我尝试使用滞后,但我只是得到 NA:
mutate(yesterday = lag(sum_criteria_1, n=1),
day_before = lag(sum_criteria_1, n = 2))
任何帮助都将受到欢迎!谢谢
解决方案
使用这样的东西
df %>% mutate(meets_criteria_1 = ifelse(Measure > 5, 1, 0)) %>%
group_by(Date) %>%
summarise(sum_criteria = sum(meets_criteria_1)) %>%
mutate(criteria2 = ifelse(lag(sum_criteria) >= 6 & lag(sum_criteria,2), 1, 0)) %>%
right_join(df, by = "Date")
样本数据
df <- structure(list(Date = c("01-01-2020", "01-01-2020", "01-01-2020",
"01-01-2020", "01-01-2020", "01-01-2020", "01-01-2020", "01-01-2020",
"01-01-2020", "01-01-2020", "01-01-2020", "01-01-2020", "01-01-2020",
"01-01-2020", "01-01-2020", "01-01-2020", "01-01-2020", "01-01-2020",
"01-01-2020", "01-01-2020", "01-01-2020", "01-01-2020", "01-01-2020",
"01-01-2020", "02-01-2020", "02-01-2020", "02-01-2020", "02-01-2020",
"02-01-2020", "02-01-2020", "02-01-2020", "02-01-2020", "02-01-2020",
"02-01-2020", "02-01-2020", "02-01-2020", "02-01-2020", "02-01-2020",
"02-01-2020", "02-01-2020", "02-01-2020", "02-01-2020", "02-01-2020",
"02-01-2020", "02-01-2020", "02-01-2020", "02-01-2020", "02-01-2020",
"03-01-2020", "03-01-2020", "03-01-2020", "03-01-2020", "03-01-2020",
"03-01-2020", "03-01-2020", "03-01-2020", "03-01-2020", "03-01-2020",
"03-01-2020", "03-01-2020", "03-01-2020", "03-01-2020", "03-01-2020",
"03-01-2020", "03-01-2020", "03-01-2020", "03-01-2020", "03-01-2020",
"03-01-2020", "03-01-2020", "03-01-2020", "03-01-2020", "04-01-2020",
"04-01-2020", "04-01-2020", "04-01-2020", "04-01-2020", "04-01-2020",
"04-01-2020", "04-01-2020", "04-01-2020", "04-01-2020", "04-01-2020",
"04-01-2020", "04-01-2020", "04-01-2020", "04-01-2020", "04-01-2020",
"04-01-2020", "04-01-2020", "04-01-2020", "04-01-2020", "04-01-2020",
"04-01-2020", "04-01-2020", "04-01-2020", "05-01-2020", "05-01-2020",
"05-01-2020", "05-01-2020", "05-01-2020", "05-01-2020", "05-01-2020",
"05-01-2020", "05-01-2020", "05-01-2020", "05-01-2020", "05-01-2020",
"05-01-2020", "05-01-2020", "05-01-2020", "05-01-2020", "05-01-2020",
"05-01-2020", "05-01-2020", "05-01-2020", "05-01-2020", "05-01-2020",
"05-01-2020", "05-01-2020"), Hour = c("00:00:00", "01:00:00",
"02:00:00", "03:00:00", "04:00:00", "05:00:00", "06:00:00", "07:00:00",
"08:00:00", "09:00:00", "10:00:00", "11:00:00", "12:00:00", "13:00:00",
"14:00:00", "15:00:00", "16:00:00", "17:00:00", "18:00:00", "19:00:00",
"20:00:00", "21:00:00", "22:00:00", "23:00:00", "00:00:00", "01:00:00",
"02:00:00", "03:00:00", "04:00:00", "05:00:00", "06:00:00", "07:00:00",
"08:00:00", "09:00:00", "10:00:00", "11:00:00", "12:00:00", "13:00:00",
"14:00:00", "15:00:00", "16:00:00", "17:00:00", "18:00:00", "19:00:00",
"20:00:00", "21:00:00", "22:00:00", "23:00:00", "00:00:00", "01:00:00",
"02:00:00", "03:00:00", "04:00:00", "05:00:00", "06:00:00", "07:00:00",
"08:00:00", "09:00:00", "10:00:00", "11:00:00", "12:00:00", "13:00:00",
"14:00:00", "15:00:00", "16:00:00", "17:00:00", "18:00:00", "19:00:00",
"20:00:00", "21:00:00", "22:00:00", "23:00:00", "00:00:00", "01:00:00",
"02:00:00", "03:00:00", "04:00:00", "05:00:00", "06:00:00", "07:00:00",
"08:00:00", "09:00:00", "10:00:00", "11:00:00", "12:00:00", "13:00:00",
"14:00:00", "15:00:00", "16:00:00", "17:00:00", "18:00:00", "19:00:00",
"20:00:00", "21:00:00", "22:00:00", "23:00:00", "00:00:00", "01:00:00",
"02:00:00", "03:00:00", "04:00:00", "05:00:00", "06:00:00", "07:00:00",
"08:00:00", "09:00:00", "10:00:00", "11:00:00", "12:00:00", "13:00:00",
"14:00:00", "15:00:00", "16:00:00", "17:00:00", "18:00:00", "19:00:00",
"20:00:00", "21:00:00", "22:00:00", "23:00:00"), Measure = c(2L,
14L, 4L, 19L, 0L, 15L, 13L, 17L, 3L, 19L, 0L, 17L, 15L, 14L,
8L, 7L, 13L, 14L, 4L, 18L, 18L, 14L, 8L, 1L, 3L, 12L, 18L, 7L,
13L, 15L, 12L, 17L, 2L, 8L, 1L, 18L, 19L, 14L, 2L, 7L, 12L, 17L,
14L, 20L, 1L, 15L, 18L, 1L, 12L, 5L, 0L, 20L, 19L, 10L, 7L, 5L,
8L, 8L, 0L, 15L, 16L, 20L, 14L, 18L, 17L, 3L, 15L, 14L, 4L, 17L,
16L, 11L, 12L, 10L, 7L, 0L, 15L, 3L, 12L, 17L, 6L, 4L, 16L, 4L,
15L, 0L, 5L, 7L, 6L, 3L, 15L, 10L, 12L, 19L, 13L, 3L, 18L, 14L,
11L, 18L, 15L, 17L, 19L, 1L, 18L, 16L, 14L, 2L, 3L, 2L, 16L,
10L, 2L, 12L, 10L, 7L, 5L, 9L, 12L, 17L)), class = "data.frame", row.names = c(NA,
-120L))
检查结果
# A tibble: 120 x 5
Date sum_criteria criteria2 Hour Measure
<chr> <dbl> <dbl> <chr> <int>
1 01-01-2020 17 NA 00:00:00 2
2 01-01-2020 17 NA 01:00:00 14
3 01-01-2020 17 NA 02:00:00 4
4 01-01-2020 17 NA 03:00:00 19
5 01-01-2020 17 NA 04:00:00 0
6 01-01-2020 17 NA 05:00:00 15
7 01-01-2020 17 NA 06:00:00 13
8 01-01-2020 17 NA 07:00:00 17
9 01-01-2020 17 NA 08:00:00 3
10 01-01-2020 17 NA 09:00:00 19
# ... with 110 more rows
推荐阅读
- typescript - 如何使 webpack 更喜欢带有扩展名的文件?
- php - 恢复内容 file_get_content() 函数
- apache-spark - Spark java数据帧字符串无法转换为结构
- python - Python selenium for 获取值和创建命名元组的循环需要太多时间
- android - Location.requestForegroundPermissionsAsync() 未在 Android 5.1 上请求许可
- html - 以角度显示具有特定数组内容的表格
- python - 隐藏python源代码以防止人们看到它..完整而正确的方法
- c# - .NET Core 中的 Nuget 包版本问题
- node.js - Sequlize 生成带有外键的表,同时明确要求不要
- javascript - 在纯 JavaScript 中只执行一次本地函数?