首页 > 解决方案 > 如何将不同级别信息的数据转换成宽格式?

问题描述

我有一份患者手术/程序的数据(如下图所示的示例),其中一行描述了患者的程序。有2个级别的信息,

  1. 第一个是操作细节,op_start_dtpriority_operationasa_status
  2. 第二个是程序细节,即proc_descproc_table

一个操作可以有多个过程。在下面的示例中,患者A有 2 个操作(由 distinct 定义op_start_dt)。在他的第一次手术中,他有 1 个程序(由 distinct 定义proc_desc),在他的第二次手术中,他有 2 个程序。

在此处输入图像描述

我想把数据转换成宽格式,一个病人只有一行,他的信息会一个个操作一个个地排列,在每个手术中,一个个地排列,如下图。所以,proc_descxy指的是proc_descon xth 操作和 yth 过程。

在此处输入图像描述

数据:

df <- structure(list(patient = c("A", "A", "A"), department = c("GYNAECOLOGY /OBSTETRICS DEPT", 
"GYNAECOLOGY /OBSTETRICS DEPT", "GYNAECOLOGY /OBSTETRICS DEPT"
), op_start_dt = structure(c(1424853000, 1424870700, 1424870700
), class = c("POSIXct", "POSIXt"), tzone = "UTC"), priority_operation = c("Elective", 
"Elective", "Elective"), asa_status = c(2, 3, 3), proc_desc = c("UTERUS, MALIGNANT CONDITION, EXTENDED HYSTERECTOMY WITH/WITHOUT LYMPHADENECTOMY", 
"KIDNEY AND URETER, VARIOUS LESIONS, NEPHROURETERECTOMY, LAPAROSCOPIC", 
"HEART, VARIOUS LESIONS, HEART TRANSPLANTATION"), proc_table = c("99", 
"6A", "7C")), row.names = c(NA, 3L), class = "data.frame")

期望的输出:

df <- structure(list(patient = "A", department = "GYNAECOLOGY /OBSTETRICS DEPT", 
    no_op = 2, op_start_dt1 = structure(1424853000, class = c("POSIXct", 
    "POSIXt"), tzone = "UTC"), no_proc1 = 1, priority_operation1 = "Elective", 
    asa_status1 = 2, proc_desc11 = "UTERUS, MALIGNANT CONDITION, EXTENDED HYSTERECTOMY WITH/WITHOUT LYMPHADENECTOMY", 
    proc_table11 = "99", op_start_dt2 = structure(1424870700, class = c("POSIXct", 
    "POSIXt"), tzone = "UTC"), no_of_proc2 = 2, priority_operation2 = "Elective", 
    asa_status2 = 3, proc_desc21 = "KIDNEY AND URETER, VARIOUS LESIONS, NEPHROURETERECTOMY, LAPAROSCOPIC", 
    proc_table21 = "6A", proc_desc22 = "HEART, VARIOUS LESIONS, HEART TRANSPLANTATION", 
    proc_table22 = "7C"), row.names = 1L, class = "data.frame")

我的尝试: 我试图解决这个问题,但一路上它变得混乱,pivot_longer然后pivot_wider再一次。

df %>%
  # Operation-level Information
  group_by(patient) %>%
  mutate(op_nth = dense_rank(op_start_dt),
         no_op = n_distinct(op_start_dt)) %>%

  # Procedure-level Information
  group_by(patient, op_start_dt) %>% 
  mutate(proc_nth = row_number(),
         no_proc = n_distinct(proc_desc)) %>% 
  ungroup() %>% 

  # Make pivoting easier
  mutate_all(as.character) %>% 

  # Pivot Procedure-level Information
  pivot_longer(-c(patient, department, no_op, op_nth, proc_nth)) %>%

  # Remove the indices for "Procedure" for Operation_level Information
  mutate(proc_nth = case_when(!(name %in% c("op_start_dt", "no_proc", "priority_operation", "asa_status")) ~ proc_nth)) %>% 

  # Create the column names
  unite(name, c(name, op_nth, proc_nth), sep = "", na.rm = TRUE) %>% 
  distinct() %>% 

  pivot_wider(names_from = name, values_from = value) 

标签: rdplyrtidyr

解决方案


为每个创建一个唯一的 ID 列patient,然后使用pivot_wider.

library(dplyr)

df %>%
  group_by(patient) %>%
  mutate(row = row_number()) %>%
  tidyr::pivot_wider(names_from = row, values_from = op_start_dt:proc_table)

推荐阅读