r - 根据 R 中日期时间的条件连接多列
问题描述
我有一个这样的数据框
ID <- c("1D01","1D02","1D03","1D04","1D05","1D06","1D07","1D08","1D09")
A <- c("2020-05-29 00:00:13","2020-06-09 00:00:13","2020-06-06 00:00:13",
"2020-06-03 00:00:13","2020-06-03 00:00:13","2020-06-03 00:00:13",
"2020-06-03 00:00:13","2020-06-03 00:00:13",NA)
B <- c("2020-06-01 00:00:13","2020-06-08 00:00:13","2020-06-19 00:00:13",
"2020-06-21 00:00:13","2020-06-03 00:00:13","2020-06-03 00:00:13",
"2020-06-07 00:00:13","2020-06-07 00:00:13",NA)
C <- c("2020-06-03 00:00:13","2020-06-07 00:00:13","2020-06-01 00:00:13",
"2020-06-11 00:00:13","2020-06-03 00:00:13","2020-06-03 00:00:13",
"2020-06-03 00:00:13",NA,"2020-06-07 00:00:13")
D <- c("2020-06-04 00:00:13","2020-06-05 00:00:13","2020-06-08 00:00:13",
"2020-06-01 00:00:13","2020-06-04 00:00:13","2020-06-03 00:00:13",
"2020-06-01 00:00:13",NA,"2020-06-03 00:00:13")
df <- data.frame(ID,A,B,C,D)
df$A <- as.POSIXct(df$A)
df$B <- as.POSIXct(df$B)
df$C <- as.POSIXct(df$C)
df$D <- as.POSIXct(df$D)
我正在创建一个path
基于其他列日期的以下条件并基于日期升序调用的列
- 查看 4 列(A、B、C、D)的日期时间顺序,根据日期时间的升序连接列。例如:A_B_C_D如果 A 的日期时间最短,D 的日期时间最长。
- 如果 2 列或更多列具有相同的日期时间,则不使用下划线连接。例如:A_BC_D如果 B 和 C 具有相同的日期时间
- 如果列具有 NA,则在连接时排除该列。例如:A_B_D如果 C 有 NA
我想要的输出是
ID A B C D path
1 1D01 2020-05-29 00:00:13 2020-06-01 00:00:13 2020-06-03 00:00:13 2020-06-04 00:00:13 A_B_C_D
2 1D02 2020-06-09 00:00:13 2020-06-08 00:00:13 2020-06-07 00:00:13 2020-06-05 00:00:13 D_C_B_A
3 1D03 2020-06-06 00:00:13 2020-06-19 00:00:13 2020-06-01 00:00:13 2020-06-08 00:00:13 C_A_D_B
4 1D04 2020-06-03 00:00:13 2020-06-21 00:00:13 2020-06-11 00:00:13 2020-06-01 00:00:13 D_A_C_B
5 1D05 2020-06-03 00:00:13 2020-06-03 00:00:13 2020-06-03 00:00:13 2020-06-04 00:00:13 ABC_D
6 1D06 2020-06-03 00:00:13 2020-06-03 00:00:13 2020-06-03 00:00:13 2020-06-03 00:00:13 ABCD
7 1D07 2020-06-03 00:00:13 2020-06-07 00:00:13 2020-06-03 00:00:13 2020-06-01 00:00:13 D_AC_B
8 1D08 2020-06-03 00:00:13 2020-06-07 00:00:13 <NA> <NA> A_B
9 1D09 <NA> <NA> 2020-06-07 00:00:13 2020-06-03 00:00:13 D_C
我正在尝试这样做,但显然不起作用
library(dplyr)
df %>%
mutate(path = case_when(
A >= B >= C >= D ~ "(A_B_C_D)",
TRUE ~ "(ABD_C)"))
如何获得我想要的输出?有人能指出我正确的方向吗?
解决方案
这是否实现了您正在寻找的东西?
library(tidyverse)
df_out <- df %>%
pivot_longer(-1) %>%
filter(!is.na(value)) %>%
arrange(ID, value) %>%
group_by(ID) %>%
mutate(name2 = if_else(value == lead(value), name, paste0(name, "_")),
name2 = if_else(is.na(name2), name, name2),
path = paste(name2, collapse = ""),
name2 = NULL) %>%
pivot_wider()