r - 如何将不同级别信息的数据转换成宽格式?
问题描述
我有一份患者手术/程序的数据(如下图所示的示例),其中一行描述了患者的程序。有2个级别的信息,
- 第一个是操作细节,
op_start_dt
即priority_operation
和asa_status
- 第二个是程序细节,即
proc_desc
和proc_table
一个操作可以有多个过程。在下面的示例中,患者A
有 2 个操作(由 distinct 定义op_start_dt
)。在他的第一次手术中,他有 1 个程序(由 distinct 定义proc_desc
),在他的第二次手术中,他有 2 个程序。
我想把数据转换成宽格式,一个病人只有一行,他的信息会一个个操作一个个地排列,在每个手术中,一个个地排列,如下图。所以,proc_descxy
指的是proc_desc
on xth 操作和 yth 过程。
数据:
df <- structure(list(patient = c("A", "A", "A"), department = c("GYNAECOLOGY /OBSTETRICS DEPT",
"GYNAECOLOGY /OBSTETRICS DEPT", "GYNAECOLOGY /OBSTETRICS DEPT"
), op_start_dt = structure(c(1424853000, 1424870700, 1424870700
), class = c("POSIXct", "POSIXt"), tzone = "UTC"), priority_operation = c("Elective",
"Elective", "Elective"), asa_status = c(2, 3, 3), proc_desc = c("UTERUS, MALIGNANT CONDITION, EXTENDED HYSTERECTOMY WITH/WITHOUT LYMPHADENECTOMY",
"KIDNEY AND URETER, VARIOUS LESIONS, NEPHROURETERECTOMY, LAPAROSCOPIC",
"HEART, VARIOUS LESIONS, HEART TRANSPLANTATION"), proc_table = c("99",
"6A", "7C")), row.names = c(NA, 3L), class = "data.frame")
期望的输出:
df <- structure(list(patient = "A", department = "GYNAECOLOGY /OBSTETRICS DEPT",
no_op = 2, op_start_dt1 = structure(1424853000, class = c("POSIXct",
"POSIXt"), tzone = "UTC"), no_proc1 = 1, priority_operation1 = "Elective",
asa_status1 = 2, proc_desc11 = "UTERUS, MALIGNANT CONDITION, EXTENDED HYSTERECTOMY WITH/WITHOUT LYMPHADENECTOMY",
proc_table11 = "99", op_start_dt2 = structure(1424870700, class = c("POSIXct",
"POSIXt"), tzone = "UTC"), no_of_proc2 = 2, priority_operation2 = "Elective",
asa_status2 = 3, proc_desc21 = "KIDNEY AND URETER, VARIOUS LESIONS, NEPHROURETERECTOMY, LAPAROSCOPIC",
proc_table21 = "6A", proc_desc22 = "HEART, VARIOUS LESIONS, HEART TRANSPLANTATION",
proc_table22 = "7C"), row.names = 1L, class = "data.frame")
我的尝试:
我试图解决这个问题,但一路上它变得混乱,pivot_longer
然后pivot_wider
再一次。
df %>%
# Operation-level Information
group_by(patient) %>%
mutate(op_nth = dense_rank(op_start_dt),
no_op = n_distinct(op_start_dt)) %>%
# Procedure-level Information
group_by(patient, op_start_dt) %>%
mutate(proc_nth = row_number(),
no_proc = n_distinct(proc_desc)) %>%
ungroup() %>%
# Make pivoting easier
mutate_all(as.character) %>%
# Pivot Procedure-level Information
pivot_longer(-c(patient, department, no_op, op_nth, proc_nth)) %>%
# Remove the indices for "Procedure" for Operation_level Information
mutate(proc_nth = case_when(!(name %in% c("op_start_dt", "no_proc", "priority_operation", "asa_status")) ~ proc_nth)) %>%
# Create the column names
unite(name, c(name, op_nth, proc_nth), sep = "", na.rm = TRUE) %>%
distinct() %>%
pivot_wider(names_from = name, values_from = value)
解决方案
为每个创建一个唯一的 ID 列patient
,然后使用pivot_wider
.
library(dplyr)
df %>%
group_by(patient) %>%
mutate(row = row_number()) %>%
tidyr::pivot_wider(names_from = row, values_from = op_start_dt:proc_table)
推荐阅读
- r - 有没有办法在 R 中隐藏网络图中的边缘?
- angular - 从方法转向服务角度
- mysql - 运行 SpringBoot 应用程序时出现 Column Id not Found 错误?
- javascript - Formik + Yup,数组被转换为对象,然后验证失败
- excel - 将输入作为用户的一行
- python - Django RF,验证对象所有者时出错(错误:字段 'id' 需要一个数字,但得到了 .AnonymousUser )
- google-apps-script - 将每列的值粘贴到今天的日期 google apps 脚本
- python - 将列标题转换为索引/行值python
- html - 单击 Select2 选项时重定向 URL
- microsoft-graph-api - Microsoft Teams:列出所有聊天而不考虑用户