首页 > 解决方案 > 查找使用 R 取消的连续天数

问题描述

我正在考虑为我的员工的出勤情况制作记分卡,我需要帮助来计算分数。我的条件是,如果员工取消一天是-1分,但是如果他们连续多天取消,它仍然只是-1分。

例如:

employee   workdate    reason 
employee1  7/7/19      CAOF
employee1  7/19/19      CAOF
employee1  8/30/19      PUL 
employee1 10/02/19      CAOF 
employee1  10/9/19      CAOF 
employee1  10/10/19      CAOF 

(数据实际样子的图片)

数据片段

因此,有了这些数据,我会看到员工 1 在此时间段内取消了 5 天(原因 CAOF)。然而上一次他连续两天取消比赛,所以对他只算一分。所以他在这段时间内的得分是-4分。

我只有简单的 R 知识,但我正在努力学习,谁能帮我开始?

标签: rdays

解决方案


如果您应用diff跨日期(假设为sorted),则可以过滤掉低于阈值的日期。例如,

dat$workdate <- as.Date(dat$workdate, "%m/%d/%y")
dat$datediff <- ave(as.integer(dat$workdate), dat$employee, FUN = function(z) c(Inf, diff(z)))
dat
#    employee   workdate reason datediff
# 1 employee1 2019-07-07   CAOF      Inf
# 2 employee1 2019-07-19   CAOF       12
# 3 employee1 2019-08-30    PUL       42
# 4 employee1 2019-10-02   CAOF       33
# 5 employee1 2019-10-09   CAOF        7
# 6 employee1 2019-10-10   CAOF        1

或者您可以ave只使用生成一个逻辑变量:

dat$usereason <- ave(as.integer(dat$workdate), dat$employee, FUN = function(z) c(TRUE, diff(z) > 1))
dat
#    employee   workdate reason datediff usereason
# 1 employee1 2019-07-07   CAOF      Inf         1
# 2 employee1 2019-07-19   CAOF       12         1
# 3 employee1 2019-08-30    PUL       42         1
# 4 employee1 2019-10-02   CAOF       33         1
# 5 employee1 2019-10-09   CAOF        7         1
# 6 employee1 2019-10-10   CAOF        1         0

请注意,这会ave强制输出与其x=参数的类相同,因此如果不从外部重新分类,我们就无法在此处(我知道)返回文字TRUE/ 。FALSE


数据:

dat <- structure(list(employee = c("employee1", "employee1", "employee1", "employee1", "employee1", "employee1"), workdate = c("7/7/19", "7/19/19", "8/30/19", "10/02/19", "10/9/19", "10/10/19"), reason = c("CAOF", "CAOF", "PUL", "CAOF", "CAOF", "CAOF")), class = "data.frame", row.names = c(NA, -6L))

推荐阅读