首页 > 解决方案 > 如何使用 R 根据 group_by cols 和时间戳计算平均响应时间和总响应时间?

问题描述

我有一个看起来像这样的表(1 个示例 - 总共 200 万行):

tweet_id |  id         |  group   |                created_at         | tweet                                                 |  response_tweet_id
1           sprintcare    Support      Tue Oct 31 22:10:47 +0000 2017   @115712 I understand. I would like to assist you.        2
2           115712        Customer     Tue Oct 31 22:11:45 +0000 2017   @sprintcare and how do you propose we do that?           NA
3           115712        Customer     Tue Oct 31 22:08:27 +0000 2017   @sprintcare I have sent several private messages.        1
4           sprintcare    Support      Tue Oct 31 21:54:49 +0000 2017   @115712 Please send us a Private Message.                3
5           115712        Customer     Tue Oct 31 21:49:35 +0000 2017   @sprintcare I did.                                       4
6           sprintcare    Support      Tue Oct 31 21:46:24 +0000 2017   @115712 Can you please send us a private message.        5,7
8           115712        Customer     Tue Oct 31 21:45:10 +0000 2017   @sprintcare is the worst customer service                9,6,10
11          apple         Support      Tue Oct 31 22:10:35 +0000 2017  @115713 This is saddening to hear.                        NA
12          115713        Customer     Tue Oct 31 22:04:47 +0000 2017   @apple My phone is not working. Help!               11

引用第一行时,我们可以看到响应 tweet_id 1(标记为 2)时第一个支持 tweet 响应的时间。

理想情况下 - 我想计算每条回复推文需要多长时间 - 从支持到客户订购。

我想计算两个值:

  1. 发送给支持人员的第一条推文之间的响应时间(tweet_id、response_tweet_id 和 created_at)。对于第一行 - 时差为:Tue Oct 31 22:10:47 +0000 2017 - Tue Oct 31 22:08:27 +0000 2017 = 00:02:20。

  2. 给每个客户的第一条推文从支持到最后一条推文之间的总响应时间。在下面的示例中 - 它本质上是相对于支持时间发出的第一条推文,以及在下一个唯一 ID 发挥作用之前发出的最后一条推文。

根据每个组(100 多家独特的公司),所需的输出看起来像这样:

id         | group  | Avg_response_time_per_tweet (in minutes) | Total_avg_response_time (in minutes)
sprintcare   Support  ####                                       ####
apple        Support  ####                                       #### 

标签: rdatetimedplyrtidytext

解决方案


library(lubridate)
library(dplyr)
library(tidyr)
df %>% 
    separate(created_at, c("Day_name", "Month", "Day", "Hour", "Minute", "Second", "X", "Year")) %>% 
    type.convert(as.is = TRUE) %>% 
    mutate(Month = match(Month, month.abb)) %>% 
    mutate(created_at = make_datetime(Year, Month, Day, Hour, Minute, Second), .keep="unused") %>% 
    group_by(id, group) %>% 
    summarise(Avg_response_time = mean(difftime(max(created_at), min(created_at))))
  id         group    Avg_response_time
  <chr>      <chr>    <drtn>           
1 115712     Customer 26.58333 mins    
2 sprintcare Support  24.38333 mins  

推荐阅读