首页 > 解决方案 > 基于R中的日期时间查找唯一ID的最后一个和第一个值

问题描述

我在下面提到了数据框:

ID       Date                 Value1    Value2     Value3      Value4
XX-12    2018-02-01 15:48:15  XXC       1000       15.45       18
XX-12    2018-02-05 20:18:43  XTR       1500       15.45       12
XX-13    2018-02-03 19:14:17  XRR       1900       18.25       10
XX-13    2018-02-03 22:42:18  XTC       1600       20.25       12
XX-14    2018-02-04 23:14:45  XRY       1100       10.50       10
XX-15    2018-02-05 21:16:48  XTC       1400       20.25       14

从上面的数据框中,我想根据 Datetime 以及初始值和最终值之间的差异得出初始值 ( I_Value) 和最终值 ( )。F_Value

所需输出:

ID      I_Value1    F_Value1   I_Value2   F_Value2   Diff2   I_vaule3   F_Value3   Diff3   I_Value4    F_Value4   Diff4
XX-12   XXC         XTR        1000       1500       500     15.45      15.45      0       18          12         -6
XX-13   XRR         XTC        1900       1600      -300     18.25      20.25      2       10          12          2
XX-14   XRY         XTC        1100       1100       0       10.50      10.50      0       10          10          0
XX-15   XTC         XTC        1400       1400       0       20.25      20.25      0       14          14          0

标签: rtidyrtidyverse

解决方案


使用dplyr

library(dplyr)

df %>%
  mutate(Date = as.POSIXct(Date, "%Y-%m-%d %H:%M:%S", tz = "GMT")) %>%
  arrange(ID, Date) %>%
  group_by(ID) %>%
  summarise_at(vars(Value1:Value4), funs(I = first(.),
                                         F = last(.),
                                         Diff = ifelse(is.character(.), NA, last(.) - first(.)))) %>%
  select_if(~!all(is.na(.)))

  ID    Value1_I Value2_I Value3_I Value4_I Value1_F Value2_F Value3_F Value4_F Value2_Diff Value3_Diff Value4_Diff
  <chr> <chr>       <int>    <dbl>    <int> <chr>       <int>    <dbl>    <int>       <int>       <dbl>       <int>
1 XX-12 XXC          1000     15.4       18 XTR          1500     15.4       12         500        0             -6
2 XX-13 XRR          1900     18.2       10 XTC          1600     20.2       12        -300        2.00           2
3 XX-14 XRY          1100     10.5       10 XRY          1100     10.5       10           0        0              0
4 XX-15 XTC          1400     20.2       14 XTC          1400     20.2       14           0        0              0


样本数据:

df <- structure(list(ID = c("XX-12", "XX-12", "XX-13", "XX-13", "XX-14", 
"XX-15"), Date = c("2018-02-01 15:48:15", "2018-02-05 20:18:43", 
"2018-02-03 19:14:17", "2018-02-03 22:42:18", "2018-02-04 23:14:45", 
"2018-02-05 21:16:48"), Value1 = c("XXC", "XTR", "XRR", "XTC", 
"XRY", "XTC"), Value2 = c(1000L, 1500L, 1900L, 1600L, 1100L, 
1400L), Value3 = c(15.45, 15.45, 18.25, 20.25, 10.5, 20.25), 
    Value4 = c(18L, 12L, 10L, 12L, 10L, 14L)), .Names = c("ID", 
"Date", "Value1", "Value2", "Value3", "Value4"), class = "data.frame", row.names = c(NA, 
-6L))

推荐阅读