r - Calculating the time spent with dates and time format in r
问题描述
I am working on a time/date variable and trying to estimate the time spent for each record. I am following two steps for this analysis.
(a)
modify the variables in the desired format
(b)
calculate the time spent on each question.
Here how my dataset looks like:
id <- c(1,1,1,1,1, 2,2,2,2,2)
item.id <- c(1,2,3,4,5, 1,2,3,4,5)
submit.time <-c("2019-04-09 09:50:30.340","2019-04-09 09:52:12.440","2019-04-09 09:52:15.787","2019-04-09 09:53:21.587","2019-04-09 09:53:49.047",
"2019-04-09 09:49:45.243","2019-04-09 09:52:53.663","2019-04-09 09:53:23.293","2019-04-09 09:54:00.727","2019-04-09 09:54:52.400")
start.time <- c("04/09/2019 09:50:02.317 AM","04/09/2019 09:50:02.317 AM","04/09/2019 09:50:02.317 AM","04/09/2019 09:50:02.317 AM","04/09/2019 09:50:02.317 AM",
"04/09/2019 09:47:42.583 AM","04/09/2019 09:47:42.583 AM","04/09/2019 09:47:42.583 AM","04/09/2019 09:47:42.583 AM","04/09/2019 09:47:42.583 AM")
data <- data.frame(id, item.id,start.time, submit.time)
> data
id item.id start.time submit.time
1 1 1 04/09/2019 09:50:02.317 AM 2019-04-09 09:50:30.340
2 1 2 04/09/2019 09:50:02.317 AM 2019-04-09 09:52:12.440
3 1 3 04/09/2019 09:50:02.317 AM 2019-04-09 09:52:15.787
4 1 4 04/09/2019 09:50:02.317 AM 2019-04-09 09:53:21.587
5 1 5 04/09/2019 09:50:02.317 AM 2019-04-09 09:53:49.047
6 2 1 04/09/2019 09:47:42.583 AM 2019-04-09 09:49:45.243
7 2 2 04/09/2019 09:47:42.583 AM 2019-04-09 09:52:53.663
8 2 3 04/09/2019 09:47:42.583 AM 2019-04-09 09:53:23.293
9 2 4 04/09/2019 09:47:42.583 AM 2019-04-09 09:54:00.727
10 2 5 04/09/2019 09:47:42.583 AM 2019-04-09 09:54:52.400
id
for each student, item.id
is the id of the question, start.time
is the for login time to the exam(unique time for each student), and submit.time
is the time when a student submits the answer for each question.
(a)
editing data: this step includes removing AM|PM
and switching the order of the start.time
because I would like to use the submit.time
's format, and edited start.time
's format.
data$start.time <- gsub(" AM| PM", "", data$start.time) # exclude AM or PM
data$start.time <- gsub("/", "-", data$start.time) #replace / with -
dtparts = t(as.data.frame(strsplit(data$start.time,' '))) # split date and time
row.names(dtparts) = NULL
data$newdate <- strptime(as.character(dtparts[,1]), "%m-%d-%Y") # switch the date order
data$newdate <- as.POSIXct(data$newdate) # R was complaining about the time format-had to change here
data$start.time <- paste0(data$newdate," ",dtparts[,2]) # bring the time back
Now, the two-timing variables look the same. I transformed these dates and times to seconds.
data %>%
mutate(start.time.num = as.numeric(as.POSIXct(start.time), units="secs")) %>%
mutate(submit.time.num = as.numeric(as.POSIXct(submit.time), units="secs"))
id item.id start.time submit.time newdate start.time.num submit.time.num
1 1 1 2019-04-09 09:50:02.317 2019-04-09 09:50:30.340 2019-04-09 1554817802 1554817830
2 1 2 2019-04-09 09:50:02.317 2019-04-09 09:52:12.440 2019-04-09 1554817802 1554817932
3 1 3 2019-04-09 09:50:02.317 2019-04-09 09:52:15.787 2019-04-09 1554817802 1554817936
4 1 4 2019-04-09 09:50:02.317 2019-04-09 09:53:21.587 2019-04-09 1554817802 1554818002
5 1 5 2019-04-09 09:50:02.317 2019-04-09 09:53:49.047 2019-04-09 1554817802 1554818029
6 2 1 2019-04-09 09:47:42.583 2019-04-09 09:49:45.243 2019-04-09 1554817663 1554817785
7 2 2 2019-04-09 09:47:42.583 2019-04-09 09:52:53.663 2019-04-09 1554817663 1554817974
8 2 3 2019-04-09 09:47:42.583 2019-04-09 09:53:23.293 2019-04-09 1554817663 1554818003
9 2 4 2019-04-09 09:47:42.583 2019-04-09 09:54:00.727 2019-04-09 1554817663 1554818041
10 2 5 2019-04-09 09:47:42.583 2019-04-09 09:54:52.400 2019-04-09 1554817663 1554818092
(b)
in this step, I would like to calculate the time spent on each question here. For the first question of the first student, the time spent should be submit.time.num(1554817830) - start.time.num(1554817802)=28
. For the second question of the first student, the time spent should be submit.time.num(1554817932)
-
previous submit.time.num(1554817830)=102
. This procedure needs to be repeated for each student. When it gets to the second student, it should take the start.time
again for the first row of the second student.
So, the additional column should look like this:
> time.spent
time.spent
1 28
2 102
3 4
4 66
5 27
6 122
7 189
8 29
9 38
10 51
I apologize for posting this long, if you also have any suggestions for the first part, please let me know, more importantly, any suggestions for the part (b)
?
Thanks
解决方案
This can be done a bit faster in a single pipeline.
library(dplyr)
data %>%
mutate(
start.time = as.POSIXct(start.time, format = "%m/%d/%Y %H:%M:%OS"),
submit.time = as.POSIXct(submit.time),
time.spent = difftime(submit.time, start.time, units = "secs")
) %>%
group_by(id) %>%
mutate(
time.spent = c(time.spent[1], diff(time.spent))
) %>%
ungroup()
# # A tibble: 10 x 5
# id item.id start.time submit.time time.spent
# <dbl> <dbl> <dttm> <dttm> <drtn>
# 1 1 1 2019-04-09 09:50:02 2019-04-09 09:50:30 28.023 secs
# 2 1 2 2019-04-09 09:50:02 2019-04-09 09:52:12 102.100 secs
# 3 1 3 2019-04-09 09:50:02 2019-04-09 09:52:15 3.347 secs
# 4 1 4 2019-04-09 09:50:02 2019-04-09 09:53:21 65.800 secs
# 5 1 5 2019-04-09 09:50:02 2019-04-09 09:53:49 27.460 secs
# 6 2 1 2019-04-09 09:47:42 2019-04-09 09:49:45 122.660 secs
# 7 2 2 2019-04-09 09:47:42 2019-04-09 09:52:53 188.420 secs
# 8 2 3 2019-04-09 09:47:42 2019-04-09 09:53:23 29.630 secs
# 9 2 4 2019-04-09 09:47:42 2019-04-09 09:54:00 37.434 secs
# 10 2 5 2019-04-09 09:47:42 2019-04-09 09:54:52 51.673 secs
Using @akrun's suggestion, we can shorten the code a little:
data %>%
group_by(id) %>%
mutate(
start.time = as.POSIXct(start.time, format = "%m/%d/%Y %H:%M:%OS"),
submit.time = as.POSIXct(submit.time),
time.spent = submit.time - lag(submit.time, default = first(start.time))
) %>%
ungroup()
(and optionally remove the columns you no longer need).
推荐阅读
- powershell - 将数组内容导出到 excel 的 Powershell 脚本
- ios - UITextfield 和 UIButton 没有在 UIView 内点击
- python - 计算 wgan-gp 的梯度惩罚时出错
- ios - iOS 12 推送通知在以下版本中不起作用,在 iOS 12 中未收到推送通知
- r - 使用 R 和 read.csv.sql 创建 SQLite 数据库
- javascript - 在同一个承诺内将用户和头像网址相互链接
- android - EditText 服务类
- android - CircleCI:如何处理 Java OOM 错误
- python - matplotlib colormaps: set_bad doesn't have an impact with collections
- android - Getting run-time error 'image uri must be of the content scheme type' in Android Code