首页 > 解决方案 > 按小时和天计算员工人数

问题描述

我有员工 ID、他们的打卡时间和每天的打卡时间。我想按小时计算员工人数。Excel公式也可以。

我的示例数据如下所示:

Employee ID   Day      Clockin      Clockout
1             Mon        7:00          15:00
1             Fri        7:00          15:00
2             Wed        8:00          22:00
2             Mon        10:00         18:00
2             Fri        9:00          20:00
3             Mon        7:00          8:00 

我的输出应该是这样的:

Timestamp   Mon   Tue   Wed  Thu   Fri 
7:00        2      0     0    0     1
8:00        1      0     1    0     1
9:00        1      0     1    0     2
10:00       2      0     1    0     2
11:00       2      0     1    0     2

我试图变异clock inclock out列,但没有奏效。我更喜欢 R 中的解决方案。我尝试了此处提到的解决方案计算 R 中签入和签出日期之间每月的天数,但在我的情况下似乎不起作用

输入示例

ID  Day Clockin Clockout
Employee 1  Mon 7:00    15:00
Employee 2  Mon 8:00    15:00
Employee 3  Mon 12:00   14:00
Employee 4  Mon 13:00   20:00
Employee 5  Mon 15:00   22:00
Employee 6  Mon 23:00   23:30

输出示例

Mon Count 
7:00    1
8:00    2
9:00    2
10:00   2
11:00   2
12:00   3
13:00   4
14:00   3
15:00   2
16:00   2
17:00   2
18:00   2
19:00   2
20:00   1
21:00   1
22:00   0
23:00   1

如果您注意计数,您会发现当有人在一天中打卡或打卡时它会发生变化。

标签: pythonrexceldataframedata-wrangling

解决方案


这比看起来要复杂一些,因为如果员工上夜班到第二天,我们需要更改我们在午夜之后计算他们存在的日期。

就此而言,我编写了自己的函数,它记录一天、上班和下班,并返回一系列时间(1 小时间隔)以及与每个时间间隔相关联的日期。见下文;

time.seq <- function(day.i, start.i, end.i, step.i = '1 hour'){
  
  require(lubridate)
  require(DescTools) ## to get the abbreviated weekdays

  start.i <- as.POSIXct(start.i, format="%H:%M")
  end.i <- as.POSIXct(end.i, format="%H:%M")
  
  if (start.i > end.i) { ## accounting for working on the next day (after midnight)
   end.i <- end.i + days(1) 
  }

  out.h <- seq(start.i, end.i , step.i)
  
  ## Going to the next day based on difference in time (unit = days)
  day.abb.ex <- c(day.abb, "Mon")
  
  out.d <- day.abb.ex[which(toupper(day.abb) == day.i) + 
            c(0, cumsum(as.numeric(diff(floor_date(out.h, unit = "day"))/86400)))]
  
  out <- list(DAY = out.d, HOUR = out.h)
  
  return(out)
}

然后使用该函数,类似于另一个答案中使用的逻辑,我们可以获得每小时的计数。

library(dplyr)
library(tidyr)

df %>% 
  rowwise() %>% 
  mutate(HOUR = list(time.seq(Day, Clockin, Clockout)[["HOUR"]]),
         DAY = list(time.seq(Day, Clockin, Clockout)[["DAY"]])) %>% 
  unnest(c(HOUR, DAY)) %>% 
  count(Day=DAY, Hour = format(HOUR, '%H:%M'), name = "Count") %>%
  pivot_wider(names_from = Day, values_from = Count) 

#> # A tibble: 22 x 8
#>    Hour    Fri   Mon   Sat   Sun   Thu   Tue   Wed
#>    <chr> <int> <int> <int> <int> <int> <int> <int>
#>  1 07:00     2    NA     1    NA     1     1     1
#>  2 08:00     2    NA     1    NA     1     1     1
#>  3 09:00     2    NA     1    NA     1     1     1
#>  4 10:00     4    NA     1    NA     3     3     3
#>  5 11:00     4    NA     1    NA     3     3     3
#>  6 12:00     5     1     1    NA     4     4     4
#>  7 13:00     5     1     1    NA     4     4     4
#>  8 14:00     5     1     1    NA     4     4     4
#>  9 15:00     5     1     1    NA     4     4     4
#> 10 16:00     3     1    NA    NA     3     3     3
#> # ... with 12 more rows

数据

df <- structure(list(Employee.ID = c(462L, 462L, 559L, 559L, 559L, 
559L, 560L, 560L, 560L, 560L, 560L, 715L, 715L, 715L, 715L, 715L, 
791L, 791L, 791L, 791L, 802L, 802L, 802L, 802L), Day = structure(c(2L, 
4L, 7L, 8L, 6L, 2L, 3L, 7L, 8L, 6L, 2L, 3L, 7L, 2L, 4L, 5L, 7L, 
8L, 6L, 2L, 7L, 8L, 6L, 2L), .Label = c("", "FRI", "MON", "SAT", 
"SUN", "THU", "TUE", "WED"), class = "factor"), Clockin = structure(c(5L, 
5L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 
5L, 5L, 5L, 2L, 2L, 2L, 2L), .Label = c("", "10:00", "12:00", 
"20:00", "7:00"), class = "factor"), Clockout = structure(c(2L, 
2L, 4L, 4L, 5L, 4L, 7L, 8L, 7L, 7L, 6L, 10L, 9L, 11L, 9L, 9L, 
2L, 2L, 2L, 2L, 4L, 7L, 3L, 4L), .Label = c("", "15:00", "17:30", 
"18:00", "18:15", "19:45", "20:00", "22:00", "4:00", "4:15", 
"4:45"), class = "factor")), row.names = c(NA, 24L), class = "data.frame")

推荐阅读