首页 > 解决方案 > 使用 dplyr、filter、group_by 和 summarise 计算 R 中的平均天数?

问题描述

我想创建一个表格,使用 date_diff,减去 date_sent 和 date_received,通过提交的_via(请参阅 consumer_compliants.csv)显示平均天数。过滤数据以仅显示大于 0 的 date_diff 值。所有这些都必须使用 dplyr、%>%、filter、group_by 和 summarise_at、knitr::kable() 来完成

我在 R 中试过这个

date_received <- as.Date(mydata$date_received, "%m/%d/%Y")
date_sent <- as.Date(mydata$date_sent_to_company, "%m/%d/%Y")
date_diff <- (date_sent) - (date_received)

mydata %>%                  
 filter(date_diff > 0) %>%    
 group_by(date_received, date_sent_to_company) %>%   
 summarise(
    a = mean(date_diff))

输出:

 Email         11.973214 days           
 Fax           7.057072 days            
 Phone         6.290040 days            
 Postal mail   9.627809 days            
Referral       6.761684 days            
 Web           10.695773 days   

请问有什么建议吗?

标签: rdplyrstatisticsknitrmean

解决方案


这可能更接近您想要的:

library(dplyr)

mydata %>%
  mutate_at(vars(starts_with("date_")), as.Date, format = "%m/%d/%Y") %>%
  mutate(date_diff = date_received - date_sent) %>%
  filter(date_diff > 0) %>%    
  group_by(submitted_via) %>%   
  summarise(a = mean(date_diff))

输出

# A tibble: 3 x 2
  submitted_via a      
  <fct>         <drtn> 
1 phone         22 days
2 Referral      27 days
3 web            4 days

数据

mydata <- read.table(
  text =
    "date_received      date_sent   submitted_via
  9/30/2015          9/3/2015      Referral
  9/3/2015           8/30/2015     web
  9/25/2015          9/3/2015      phone
  9/18/2015          9/18/2015     Referral", header = T
)

推荐阅读