r - 如何修改排名标准以指定截止日期
问题描述
我有这个代码
library(dplyr)
library(tidyr)
top5vio <- inc_ag %>%
mutate(Zone=ifelse(ZoneID=="Outside ZoneZ", 1, 0)) %>%
group_by(District, Period, Zone) %>%
mutate(grp=factor(+(min_rank(desc(violentIncidents))<=5) + Zone,
labels=c("Top 5", "Rest of Zones", "Outside Zones"),
levels=c(1,0,2))) %>%
group_by(District, Period, grp) %>%
summarise(n=sum(violentIncidents)) %>%
pivot_wider(names_from=grp, values_from=n, values_fill=list(n=0))
如果暴力事件变量位于前 5 位,则这使得分组变量 (grp) 等于 1,否则为 0。然后它将它与 ZonedID = "Outside Zones" 的那些结合起来创建 3 个组。
但是,如果我想在给定日期(例如 2019 年 6 月 20 日)建立前 5 名,并且选择的 zoneID 必须是截至 2019 年 6 月 20 日的前 5 名,该怎么办?什么是合适的语法?
谢谢你。
该代码转换以下数据:
obs District ZoneID Period violentIncidents
1 Northwestern Northern: 53A 2019-02-06 - 2019-03-06 4
2 Northwestern Northern: 53B 2019-02-06 - 2019-03-06 0
3 Northwestern Northwestern: 61A 2019-02-06 - 2019-03-06 88
4 Northwestern Northwestern: 61B 2019-02-06 - 2019-03-06 44
5 Northwestern Northwestern: 61D 2019-02-06 - 2019-03-06 212
6 Northwestern Northwestern: 62A 2019-02-06 - 2019-03-06 38
7 Northwestern Northwestern: 62B 2019-02-06 - 2019-03-06 18
8 Northwestern Northwestern: 62C 2019-02-06 - 2019-03-06 65
9 Northwestern Northwestern: 62D 2019-02-06 - 2019-03-06 4
10 Northwestern Northwestern: 63A 2019-02-06 - 2019-03-06 107
11 Northwestern Northwestern: 63B 2019-02-06 - 2019-03-06 19
12 Northwestern Northwestern: 63C 2019-02-06 - 2019-03-06 56
13 Northwestern Northwestern: 63D 2019-02-06 - 2019-03-06 165
14 Northwestern Northwestern: DATA 2019-02-06 - 2019-03-06 28
15 Northwestern Northwestern: DATB 2019-02-06 - 2019-03-06 26
16 Northwestern Northwestern: DATC 2019-02-06 - 2019-03-06 114
17 Northwestern Outside Zones 2019-02-06 - 2019-03-06 1501
18 Southern Outside Zones 2019-02-06 - 2019-03-06 2062
19 Southwestern Outside Zones 2019-02-06 - 2019-03-06 1351
进入这个:
District Period `Top 5` `Rest of Zones` `Outside Zones`
<chr> <chr> <int> <int> <int>
1 Northwestern 2019-02-06 - 2019-03-06 686 302 1501
2 Southern 2019-02-06 - 2019-03-06 0 0 2062
3 Southwestern 2019-02-06 - 2019-03-06 0 0 1351
解决方案
我怀疑这种方法lubridate
可能对你有用。
library(dplyr)
library(tidyr)
library(lubridate)
inc_ag %>%
mutate(Zone=ifelse(ZoneID=="Outside Zones", 1, 0)) %>%
mutate(PeriodSplit = Period) %>%
separate(col = PeriodSplit,into=c("periodStart","periodEnd"),sep = " - ") %>%
filter(periodEnd <= ymd("2019-06-20")) %>%
group_by(District, Period, Zone) %>%
mutate(grp=factor(+(min_rank(desc(violentIncidents))<=5) + Zone,
labels=c("Top 5", "Rest of Zones", "Outside Zones"),
levels=c(1,0,2))) %>%
ungroup %>%
select(District,ZoneID,grp) %>%
right_join(inc_ag,by=c("District","ZoneID")) %>%
group_by(District, Period, grp) %>%
summarise(n=sum(violentIncidents)) %>%
pivot_wider(names_from=grp, values_from=n, values_fill=list(n=0))
推荐阅读
- javascript - 无头 Chrome 执行 Javascript
- python - 自动实例化对象
- python-3.x - 将 20 个项目的列表添加到数据框中一行的连续单元格中
- performance - 对于具有不同纵横比的响应式图像,有什么方法可以最小化 Cumulative Layout Shift?
- c# - “ResolvePackageAssets”和“ResolvePackageDependencies”任务都意外失败
- javascript - 如何将 Jquery 输出打印到保存表单提交的字段?
- r - R中的因素(数据结构)
- matlab - Matlab - 定义数学函数
- android - 开始双簧管录音后主 UI 线程被阻塞
- node.js - Discord js keyv 抛出 ECONNREFUSED