首页 > 解决方案 > 根据 R 中日期时间的条件连接多列

问题描述

我有一个这样的数据框

ID <- c("1D01","1D02","1D03","1D04","1D05","1D06","1D07","1D08","1D09")
A <- c("2020-05-29 00:00:13","2020-06-09 00:00:13","2020-06-06 00:00:13",
       "2020-06-03 00:00:13","2020-06-03 00:00:13","2020-06-03 00:00:13",
       "2020-06-03 00:00:13","2020-06-03 00:00:13",NA)
B <- c("2020-06-01 00:00:13","2020-06-08 00:00:13","2020-06-19 00:00:13",
       "2020-06-21 00:00:13","2020-06-03 00:00:13","2020-06-03 00:00:13",
       "2020-06-07 00:00:13","2020-06-07 00:00:13",NA)
C <- c("2020-06-03 00:00:13","2020-06-07 00:00:13","2020-06-01 00:00:13",
       "2020-06-11 00:00:13","2020-06-03 00:00:13","2020-06-03 00:00:13",
       "2020-06-03 00:00:13",NA,"2020-06-07 00:00:13")
D <- c("2020-06-04 00:00:13","2020-06-05 00:00:13","2020-06-08 00:00:13",
       "2020-06-01 00:00:13","2020-06-04 00:00:13","2020-06-03 00:00:13",
       "2020-06-01 00:00:13",NA,"2020-06-03 00:00:13")

df <- data.frame(ID,A,B,C,D)
df$A <- as.POSIXct(df$A) 
df$B <- as.POSIXct(df$B) 
df$C <- as.POSIXct(df$C) 
df$D <- as.POSIXct(df$D)

我正在创建一个path基于其他列日期的以下条件并基于日期升序调用的列

  1. 查看 4 列(A、B、C、D)的日期时间顺序,根据日期时间的升序连接列。例如:A_B_C_D如果 A 的日期时间最短,D 的日期时间最长。
  2. 如果 2 列或更多列具有相同的日期时间,则不使用下划线连接。例如:A_BC_D如果 B 和 C 具有相同的日期时间
  3. 如果列具有 NA,则在连接时排除该列。例如:A_B_D如果 C 有 NA

我想要的输出是

    ID                   A                   B                   C                   D    path
1 1D01 2020-05-29 00:00:13 2020-06-01 00:00:13 2020-06-03 00:00:13 2020-06-04 00:00:13 A_B_C_D
2 1D02 2020-06-09 00:00:13 2020-06-08 00:00:13 2020-06-07 00:00:13 2020-06-05 00:00:13 D_C_B_A
3 1D03 2020-06-06 00:00:13 2020-06-19 00:00:13 2020-06-01 00:00:13 2020-06-08 00:00:13 C_A_D_B
4 1D04 2020-06-03 00:00:13 2020-06-21 00:00:13 2020-06-11 00:00:13 2020-06-01 00:00:13 D_A_C_B
5 1D05 2020-06-03 00:00:13 2020-06-03 00:00:13 2020-06-03 00:00:13 2020-06-04 00:00:13   ABC_D
6 1D06 2020-06-03 00:00:13 2020-06-03 00:00:13 2020-06-03 00:00:13 2020-06-03 00:00:13    ABCD
7 1D07 2020-06-03 00:00:13 2020-06-07 00:00:13 2020-06-03 00:00:13 2020-06-01 00:00:13  D_AC_B
8 1D08 2020-06-03 00:00:13 2020-06-07 00:00:13                <NA>                <NA>     A_B
9 1D09                <NA>                <NA> 2020-06-07 00:00:13 2020-06-03 00:00:13     D_C

我正在尝试这样做,但显然不起作用

library(dplyr)
df %>%
  mutate(path = case_when(
    A >= B >= C >= D  ~ "(A_B_C_D)",
    TRUE           ~ "(ABD_C)")) 

如何获得我想要的输出?有人能指出我正确的方向吗?

标签: rdataframedplyrdata.tabletidyverse

解决方案


这是否实现了您正在寻找的东西?

library(tidyverse)

df_out <- df %>%
  pivot_longer(-1) %>%
  filter(!is.na(value)) %>%
  arrange(ID, value) %>%
  group_by(ID) %>%
  mutate(name2 = if_else(value == lead(value), name, paste0(name, "_")),
         name2 = if_else(is.na(name2), name, name2),
         path = paste(name2, collapse = ""),
         name2 = NULL) %>%
  pivot_wider()

推荐阅读