r - 如何用条件索引分隔每日数据(增加、减少、增长)?
问题描述
我有每天的树木生长数据,我想将其分为:增加、减少和增长。虽然我研究了如何定义增加和减少,但我无法研究如何定义增长。
增加:连续时间戳之间的面积增加 减少:连续时间戳之间的面积减少 增长:增加的子集,其中当前每日最大面积大于上一个。日最大面积
数据如下所示:
df = data.frame(
"Date" = c(rep("31/07/2019", each= 4),rep("1/08/2019", each=
11),rep("2/08/2019", each= 14)) ,
"DateTime" = c("31/07/2019 22:13","31/07/2019 22:33","31/07/2019 23:13",
"31/07/2019 23:43","1/08/2019 15:42","1/08/2019 15:45",
"1/08/2019 15:50","1/08/2019 15:55","1/08/2019 16:00",
"1/08/2019 16:05","1/08/2019 16:11","1/08/2019 16:37",
"1/08/2019 16:57","1/08/2019 17:02","1/08/2019 17:08",
"2/08/2019 0:53","2/08/2019 1:14","2/08/2019 3:14",
"2/08/2019 4:14","2/08/2019 9:06","2/08/2019 9:36",
"2/08/2019 10:36","2/08/2019 11:36","2/08/2019 15:39",
"2/08/2019 16:39","2/08/2019 17:39","2/08/2019 18:39",
"2/08/2019 19:39","2/08/2019 20:39"),
"Area" = c(
94236, 94276, 94416, 94456, 94434, 94287, 94285, 94215, 94104,
94007, 94007, 94047, 94087, 94127, 94167, 94247, 94287, 94327,
94367, 94497, 94467, 94437, 94407, 94487, 94521, 94607, 94667,
94727, 94787) )
这就是我定义增加和减少的方式:
d5 = df%>%
mutate(Diff = Area - lag(Area))%>%
group_by(Date) %>%
mutate(class = ifelse (Diff >= 0,'increase', 'decrease' ) )%>%
select(DateTime, Date, Area, class)
现在增长既是:增长又是增长。我想用增长代替增长,当天的面积超过了前几天的最大面积。
例如:7 月 31 日的最大面积是 94456。现在,每个大于 8 月 1 日 94456 的面积都应该是增长而不是增长。如果检测到生长,则应调整分离增加和生长的阈值。新阈值应为 8 月 1 日的最高面积值 (94434)。
以下所有增长和增长的分离不仅应考虑前一天的最大面积(将 2.8 月的最大面积与 8 月 1 日的面积进行比较),还应考虑所有之前的最大面积(将 2.8 月的最大面积与面积进行比较)在 7 月 31 日和 8 月 1 日),并且仅在面积大于所有先前测量的面积时才检测到增长。
如果未检测到增长,则将增长与增长分开的阈值应保持不变,并在第二天进行评估。
我尝试使用 ifelse 和索引。问题是,我不确定如何创建一个条件索引来检查每日区域数据并在超出时进行调整。
这就是我想要的结果:
d5 = data.frame(
"Date" = c(rep("31/07/2019", each= 4),rep("1/08/2019", each=
11),rep("2/08/2019", each= 14)) ,
"DateTime" = c("31/07/2019 22:13","31/07/2019 22:33","31/07/2019 23:13",
"31/07/2019 23:43","1/08/2019 15:42","1/08/2019 15:45",
"1/08/2019 15:50","1/08/2019 15:55","1/08/2019 16:00",
"1/08/2019 16:05","1/08/2019 16:11","1/08/2019 16:37",
"1/08/2019 16:57","1/08/2019 17:02","1/08/2019 17:08",
"2/08/2019 0:53","2/08/2019 1:14","2/08/2019 3:14",
"2/08/2019 4:14","2/08/2019 9:06","2/08/2019 9:36",
"2/08/2019 10:36","2/08/2019 11:36","2/08/2019 15:39",
"2/08/2019 16:39","2/08/2019 17:39","2/08/2019 18:39",
"2/08/2019 19:39","2/08/2019 20:39"),
"Area" = c(
94236, 94276, 94416, 94456, 94434, 94287, 94285, 94215, 94104,
94007, 94007, 94047, 94087, 94127, 94167, 94247, 94287, 94327,
94367, 94497, 94467, 94437, 94407, 94487, 94521, 94607, 94667,
94727, 94787) ,
"class" = c("NA", rep("increase", each= 3), rep("decrease", each= 6),
rep("increase", each= 7), rep("growth", each= 3),
rep("decrease", each= 3), rep("increase", each= 1), rep("growth", each= 5) )
)
解决方案
也许这是一种非常复杂的方法,假设我已经正确理解了你
library(dplyr)
df %>%
mutate(DateTime = as.POSIXct(DateTime, format = "%d/%m/%Y %H:%M"),
Date = as.Date(DateTime)) %>%
arrange(DateTime) %>%
mutate(class = c("increase", "decrease")[(Area - lag(Area) < 0) + 1]) %>%
group_by(Date) %>%
mutate(prev_max = max(Area)) %>%
ungroup() %>%
mutate(prev_max = lag(prev_max)) %>%
group_by(Date) %>%
mutate(prev_max = first(prev_max),
class = case_when(class == "increase" & Area > prev_max ~ "growth",
TRUE ~ class)) %>%
select(-prev_max)
# Date DateTime Area class
# <date> <dttm> <dbl> <chr>
# 1 2019-07-31 2019-07-31 22:13:00 94236 NA
# 2 2019-07-31 2019-07-31 22:33:00 94276 increase
# 3 2019-07-31 2019-07-31 23:13:00 94416 increase
# 4 2019-07-31 2019-07-31 23:43:00 94456 increase
# 5 2019-08-01 2019-08-01 15:42:00 94434 decrease
# 6 2019-08-01 2019-08-01 15:45:00 94287 decrease
# 7 2019-08-01 2019-08-01 15:50:00 94285 decrease
# 8 2019-08-01 2019-08-01 15:55:00 94215 decrease
# 9 2019-08-01 2019-08-01 16:00:00 94104 decrease
#10 2019-08-01 2019-08-01 16:05:00 94007 decrease
# … with 19 more rows
这首先转换DateTime
为POSIXct
值和Date
日期。然后,我们根据与前一行值的比较来分配 c("increase", "decrease")
值。对于每个Date
,我们将其与之前Date
的 smax
值进行比较,如果它更大,则将其更改class
为。"growth"
编辑
对于更新的问题,我们需要将Area
与所有以前的日期进行比较最大值
df1 <- df %>%
mutate(DateTime = as.POSIXct(DateTime, format = "%d/%m/%Y %H:%M"),
Date = as.Date(DateTime)) %>%
arrange(DateTime) %>%
mutate(class = c("increase", "decrease")[(Area - lag(Area) < 0) + 1]) %>%
group_by(Date) %>%
mutate(prev_max = max(Area)) %>%
ungroup() %>%
mutate(prev_max = lag(prev_max)) %>%
group_by(Date) %>%
mutate(prev_max = first(prev_max)) %>%
ungroup
df1 %>%
mutate(prev_max = cummax(replace(prev_max, is.na(prev_max), 0)),
class = case_when(class == "increase" & Area > prev_max
& prev_max != 0 ~ "growth",
TRUE ~ class))
推荐阅读
- ios - iOS-Charts - 无法隐藏 Y 轴
- react-native - 无法将 flex 应用于导入的样式组件
- javascript - 如何解释 UTF-8 编码平假名的字节?
- ag-grid - 设置自定义工具面板组件的宽度
- c++ - 如何在 CUDA C++ 中生成 XML 文档文件?
- python - Jupyter Notebook:如何找到调用python脚本函数的单元格号?
- reactjs - 将带有 gatsby 的反应应用程序移动到 S3,现在我在浏览器中遇到错误并且使用状态无法正常工作
- css - Column flex child 不会收缩和包裹
- snowflake-cloud-data-platform - 如何从多个表的信息架构和帐户信息中维护历史详细信息?
- activerecord - 如何使用 Yii 1.1 在 FROM 子句中执行子查询?