首页 > 解决方案 > 如何修改排名标准以指定截止日期

问题描述

我有这个代码

library(dplyr)
library(tidyr)
top5vio <- inc_ag %>%
  mutate(Zone=ifelse(ZoneID=="Outside ZoneZ", 1, 0)) %>%
  group_by(District, Period, Zone) %>%
  mutate(grp=factor(+(min_rank(desc(violentIncidents))<=5) + Zone,
                    labels=c("Top 5", "Rest of Zones", "Outside Zones"),
                    levels=c(1,0,2))) %>%
  group_by(District, Period, grp) %>%
  summarise(n=sum(violentIncidents)) %>%
  pivot_wider(names_from=grp, values_from=n, values_fill=list(n=0))

如果暴力事件变量位于前 5 位,则这使得分组变量 (grp) 等于 1,否则为 0。然后它将它与 ZonedID = "Outside Zones" 的那些结合起来创建 3 个组。

但是,如果我想在给定日期(例如 2019 年 6 月 20 日)建立前 5 名,并且选择的 zoneID 必须是截至 2019 年 6 月 20 日的前 5 名,该怎么办?什么是合适的语法?

谢谢你。

该代码转换以下数据:

obs District    ZoneID  Period  violentIncidents
1   Northwestern    Northern: 53A   2019-02-06 - 2019-03-06 4
2   Northwestern    Northern: 53B   2019-02-06 - 2019-03-06 0
3   Northwestern    Northwestern: 61A   2019-02-06 - 2019-03-06 88
4   Northwestern    Northwestern: 61B   2019-02-06 - 2019-03-06 44
5   Northwestern    Northwestern: 61D   2019-02-06 - 2019-03-06 212
6   Northwestern    Northwestern: 62A   2019-02-06 - 2019-03-06 38
7   Northwestern    Northwestern: 62B   2019-02-06 - 2019-03-06 18
8   Northwestern    Northwestern: 62C   2019-02-06 - 2019-03-06 65
9   Northwestern    Northwestern: 62D   2019-02-06 - 2019-03-06 4
10  Northwestern    Northwestern: 63A   2019-02-06 - 2019-03-06 107
11  Northwestern    Northwestern: 63B   2019-02-06 - 2019-03-06 19
12  Northwestern    Northwestern: 63C   2019-02-06 - 2019-03-06 56
13  Northwestern    Northwestern: 63D   2019-02-06 - 2019-03-06 165
14  Northwestern    Northwestern: DATA  2019-02-06 - 2019-03-06 28
15  Northwestern    Northwestern: DATB  2019-02-06 - 2019-03-06 26
16  Northwestern    Northwestern: DATC  2019-02-06 - 2019-03-06 114
17  Northwestern    Outside Zones 2019-02-06 - 2019-03-06   1501
18  Southern    Outside Zones 2019-02-06 - 2019-03-06   2062
19  Southwestern Outside Zones  2019-02-06 - 2019-03-06 1351

进入这个:

  District     Period                  `Top 5` `Rest of Zones` `Outside Zones`
  <chr>        <chr>                     <int>           <int>           <int>
1 Northwestern 2019-02-06 - 2019-03-06     686             302            1501
2 Southern     2019-02-06 - 2019-03-06       0               0            2062
3 Southwestern 2019-02-06 - 2019-03-06       0               0            1351

标签: r

解决方案


我怀疑这种方法lubridate可能对你有用。

library(dplyr)
library(tidyr)
library(lubridate)
inc_ag %>%
  mutate(Zone=ifelse(ZoneID=="Outside Zones", 1, 0)) %>%
  mutate(PeriodSplit = Period) %>% 
  separate(col = PeriodSplit,into=c("periodStart","periodEnd"),sep = " - ") %>%
  filter(periodEnd <= ymd("2019-06-20")) %>%
  group_by(District, Period, Zone) %>%
  mutate(grp=factor(+(min_rank(desc(violentIncidents))<=5) + Zone,
                    labels=c("Top 5", "Rest of Zones", "Outside Zones"),
                    levels=c(1,0,2))) %>%
  ungroup %>%
  select(District,ZoneID,grp) %>%
  right_join(inc_ag,by=c("District","ZoneID")) %>%
  group_by(District, Period, grp) %>%
  summarise(n=sum(violentIncidents)) %>%
  pivot_wider(names_from=grp, values_from=n, values_fill=list(n=0))

推荐阅读