首页 > 解决方案 > R: Group_by function is not aggregating data correctly with sum & max/min date

问题描述

I have data with 6 Variables : EmployeeID, JobID, Name, JobLocation, Date, and HoursWorked. I want to group my data by EmployeeID and JobID (i.e. find all records with the same EmployeeID and JobID in one row), and then find the minimum and maximum date by group, and the sum of all HoursWorked between those dates. I want my data to end up with columns: EmployeeID, JobID, Name JobLocation, MinDate, MaxDate, TotalHoursWorked.

I have tried this so far but the MinDate,MaxDate, and TotalHoursWorked are showing the same date for every record.

Data$EmployeeID<- as.factor(Data$EmployeeID) 
Data$JobID<- as.factor(Data$JobID) 
Data$Date<- as.factor(Data$Date)
Data$Date<- as.Date(Data$Date,format="%m/%d/%Y")
Data$HoursWorked<-as.numeric(Data$HoursWorked)

Data<-Data[c("EmployeeID", "Name","JobID", "JobLocation", "Date", "HoursWorked")]
Data<- Data%>% 
  group_by(Data$EmployeeID,Data$JobID, Data$Name,Data$JobLocation) %>%
  summarize(TotalHoursWorked = sum(Data$HoursWorked)) %>%
  mutate(MaxDate=max(Data$Date), MinDate=min(Data$Date))

Output of sample(Data) without the "Name" column:

> sample(Data)
# A tibble: 1,000 x 5
   EmployeeID HoursWorked JobID           Date       JobLocation
   <fct>            <dbl> <fct>           <date>     <chr>         
 1 32589              4   B3031-002513-00 2016-03-14 #             
 2 32590              8   B3031-002562-00 2016-04-08 #             
 3 32591              9   B3031-002564-00 2016-04-05 #             
 4 32591              2.5 B3031-002564-00 2016-04-06 #             
 5 32591              3   B3031-002562-00 2016-04-07 #             
 6 32591              7.5 B3031-002562-00 2016-04-08 #             
 7 32605              0   B3031-002348-00 2016-01-04 #             
 8 32605              3   B3031-002419-00 2016-01-04 #             
 9 32605              0   B3031-002348-00 2016-01-05 #             
10 32605              3   B3031-002419-00 2016-01-05 #             
# ... with 990 more rows

And output after I run the group_by code:

> sample(Data)
# A tibble: 80 x 6
   MaxDate    `Data$JobID`    MinDate    `Data$\`Job Location\`` TotalHoursWorked `Data$EmployeeID`
   <date>     <fct>           <date>     <chr>                              <dbl> <fct>            
 1 2016-07-29 B3031-002513-00 2016-01-04 #                                  3288. 32589            
 2 2016-07-29 B3031-002562-00 2016-01-04 #                                  3288. 32590            
 3 2016-07-29 B3031-002562-00 2016-01-04 #                                  3288. 32591            
 4 2016-07-29 B3031-002564-00 2016-01-04 #                                  3288. 32591            
 5 2016-07-29 B3031-002348-00 2016-01-04 #                                  3288. 32605            
 6 2016-07-29 B3031-002419-00 2016-01-04 #                                  3288. 32605            
 7 2016-07-29 B3031-002445-00 2016-01-04 #                                  3288. 32605            
 8 2016-07-29 B3031-002502-00 2016-01-04 #                                  3288. 32605            
 9 2016-07-29 B3031-002504-00 2016-01-04 #                                  3288. 32605            
10 2016-07-29 B3031-002505-00 2016-01-04 #                                  3288. 32605            
# ... with 70 more rows

标签: r

解决方案


其实很简单,你在用summarisemutate什么时候应该只用summarise.

可能不需要第一个指令,我Date在阅读下面的数据后运行它来强制列。

Data$Date <- as.Date(Data$Date)

现在解决方案。

library(tidyverse)

Data %>%
  group_by(EmployeeID, JobID) %>%
  summarise(TotalHoursWorked = sum(HoursWorked),
            MaxDate = max(Date), MinDate = min(Date))

数据。

Data <- read.table(text = "
EmployeeID HoursWorked JobID           Date       JobLocation
  1 32589              4   B3031-002513-00 2016-03-14 #             
2 32590              8   B3031-002562-00 2016-04-08 #             
3 32591              9   B3031-002564-00 2016-04-05 #             
4 32591              2.5 B3031-002564-00 2016-04-06 #             
5 32591              3   B3031-002562-00 2016-04-07 #             
6 32591              7.5 B3031-002562-00 2016-04-08 #             
7 32605              0   B3031-002348-00 2016-01-04 #             
8 32605              3   B3031-002419-00 2016-01-04 #             
9 32605              0   B3031-002348-00 2016-01-05 #             
10 32605              3   B3031-002419-00 2016-01-05 #   
", header = TRUE, comment.char = "")

推荐阅读