首页 > 解决方案 > 如何用条件索引分隔每日数据(增加、减少、增长)?

问题描述

我有每天的树木生长数据,我想将其分为:增加、减少和增长。虽然我研究了如何定义增加和减少,但我无法研究如何定义增长。

增加:连续时间戳之间的面积增加 减少:连续时间戳之间的面积减少 增长:增加的子集,其中当前每日最大面积大于上一个。日最大面积

数据如下所示:

df = data.frame(
  "Date" = c(rep("31/07/2019", each= 4),rep("1/08/2019", each=                                             
             11),rep("2/08/2019", each= 14)) ,
  "DateTime" = c("31/07/2019 22:13","31/07/2019 22:33","31/07/2019 23:13",
                 "31/07/2019 23:43","1/08/2019 15:42","1/08/2019 15:45",
                 "1/08/2019 15:50","1/08/2019 15:55","1/08/2019 16:00",
                 "1/08/2019 16:05","1/08/2019 16:11","1/08/2019 16:37",
                 "1/08/2019 16:57","1/08/2019 17:02","1/08/2019 17:08",
                 "2/08/2019 0:53","2/08/2019 1:14","2/08/2019 3:14",
                 "2/08/2019 4:14","2/08/2019 9:06","2/08/2019 9:36",
                 "2/08/2019 10:36","2/08/2019 11:36","2/08/2019 15:39",
                 "2/08/2019 16:39","2/08/2019 17:39","2/08/2019 18:39",
                 "2/08/2019 19:39","2/08/2019 20:39"),
  "Area" = c(
    94236, 94276, 94416, 94456, 94434, 94287, 94285, 94215, 94104, 
    94007, 94007, 94047, 94087, 94127, 94167, 94247, 94287, 94327, 
    94367, 94497, 94467, 94437, 94407, 94487, 94521, 94607, 94667, 
    94727, 94787) )

这就是我定义增加和减少的方式:

d5 = df%>%
        mutate(Diff = Area - lag(Area))%>% 
        group_by(Date) %>% 
        mutate(class = ifelse (Diff >= 0,'increase', 'decrease' ) )%>%
         select(DateTime, Date, Area, class)

现在增长既是:增长又是增长。我想用增长代替增长,当天的面积超过了前几天的最大面积。

例如:7 月 31 日的最大面积是 94456。现在,每个大于 8 月 1 日 94456 的面积都应该是增长而不是增长。如果检测到生长,则应调整分离增加和生长的阈值。新阈值应为 8 月 1 日的最高面积值 (94434)。

以下所有增长和增长的分离不仅应考虑前一天的最大面积(将 2.8 月的最大面积与 8 月 1 日的面积进行比较),还应考虑所有之前的最大面积(将 2.8 月的最大面积与面积进行比较)在 7 月 31 日和 8 月 1 日),并且仅在面积大于所有先前测量的面积时才检测到增长。

如果未检测到增长,则将增长与增长分开的阈值应保持不变,并在第二天进行评估。

我尝试使用 ifelse 和索引。问题是,我不确定如何创建一个条件索引来检查每日区域数据并在超出时进行调整。

这就是我想要的结果:

d5 = data.frame(
  "Date" = c(rep("31/07/2019", each= 4),rep("1/08/2019", each=                                             
                                              11),rep("2/08/2019", each= 14)) ,
  "DateTime" = c("31/07/2019 22:13","31/07/2019 22:33","31/07/2019 23:13",
                 "31/07/2019 23:43","1/08/2019 15:42","1/08/2019 15:45",
                 "1/08/2019 15:50","1/08/2019 15:55","1/08/2019 16:00",
                 "1/08/2019 16:05","1/08/2019 16:11","1/08/2019 16:37",
                 "1/08/2019 16:57","1/08/2019 17:02","1/08/2019 17:08",
                 "2/08/2019 0:53","2/08/2019 1:14","2/08/2019 3:14",
                 "2/08/2019 4:14","2/08/2019 9:06","2/08/2019 9:36",
                 "2/08/2019 10:36","2/08/2019 11:36","2/08/2019 15:39",
                 "2/08/2019 16:39","2/08/2019 17:39","2/08/2019 18:39",
                 "2/08/2019 19:39","2/08/2019 20:39"),
  "Area" = c(
    94236, 94276, 94416, 94456, 94434, 94287, 94285, 94215, 94104, 
    94007, 94007, 94047, 94087, 94127, 94167, 94247, 94287, 94327, 
    94367, 94497, 94467, 94437, 94407, 94487, 94521, 94607, 94667, 
    94727, 94787) ,
  "class" = c("NA", rep("increase", each= 3), rep("decrease", each= 6),
                    rep("increase", each= 7), rep("growth", each= 3), 
                    rep("decrease", each= 3), rep("increase", each=  1), rep("growth", each= 5) )
  )  

标签: r

解决方案


也许这是一种非常复杂的方法,假设我已经正确理解了你

library(dplyr)

df %>%
  mutate(DateTime = as.POSIXct(DateTime, format = "%d/%m/%Y %H:%M"), 
         Date  = as.Date(DateTime)) %>%
  arrange(DateTime) %>%
  mutate(class = c("increase", "decrease")[(Area - lag(Area) < 0) + 1]) %>%
  group_by(Date) %>%
  mutate(prev_max = max(Area)) %>%
  ungroup() %>%
  mutate(prev_max = lag(prev_max)) %>%
  group_by(Date) %>%
  mutate(prev_max = first(prev_max), 
         class = case_when(class == "increase" & Area > prev_max ~ "growth", 
                       TRUE ~ class)) %>%
  select(-prev_max)


#   Date       DateTime             Area class   
#   <date>     <dttm>              <dbl> <chr>   
# 1 2019-07-31 2019-07-31 22:13:00 94236 NA      
# 2 2019-07-31 2019-07-31 22:33:00 94276 increase
# 3 2019-07-31 2019-07-31 23:13:00 94416 increase
# 4 2019-07-31 2019-07-31 23:43:00 94456 increase
# 5 2019-08-01 2019-08-01 15:42:00 94434 decrease
# 6 2019-08-01 2019-08-01 15:45:00 94287 decrease
# 7 2019-08-01 2019-08-01 15:50:00 94285 decrease
# 8 2019-08-01 2019-08-01 15:55:00 94215 decrease
# 9 2019-08-01 2019-08-01 16:00:00 94104 decrease
#10 2019-08-01 2019-08-01 16:05:00 94007 decrease
# … with 19 more rows

这首先转换DateTimePOSIXct值和Date日期。然后,我们根据与前一行值的比较来分配 c("increase", "decrease")值。对于每个Date,我们将其与之前Date的 smax值进行比较,如果它更大,则将其更改class为。"growth"


编辑

对于更新的问题,我们需要将Area与所有以前的日期进行比较最大值

df1 <- df %>%
        mutate(DateTime = as.POSIXct(DateTime, format = "%d/%m/%Y %H:%M"), 
               Date  = as.Date(DateTime)) %>%
        arrange(DateTime) %>%
        mutate(class = c("increase", "decrease")[(Area - lag(Area) < 0) + 1]) %>%
        group_by(Date) %>%
        mutate(prev_max = max(Area)) %>%
        ungroup() %>%
        mutate(prev_max = lag(prev_max)) %>%
        group_by(Date) %>%
        mutate(prev_max = first(prev_max)) %>%
        ungroup


df1 %>%
   mutate(prev_max = cummax(replace(prev_max, is.na(prev_max), 0)), 
          class = case_when(class == "increase" & Area > prev_max 
                            & prev_max != 0 ~ "growth", 
                            TRUE ~ class)) 

推荐阅读