首页 > 解决方案 > R tidyverse,将数据重塑为每个主题一行,但有几列受到影响

问题描述

我有一个这样的示例数据框:

sample2<-structure(list(`Full Name` = c("Smith, Jane", NA, NA, NA, 
                           NA, NA, "Doe, John", NA, NA, NA), `Age 
           (Y)` = c("24", 
                    NA, NA, NA, NA, NA, "22", NA, NA, NA), Gender = c("F", NA, NA, 
                                                                      NA, NA, NA, "M", NA, NA, NA), `Procedure Performed 
           (ICD9 Code)` = c("34.04 INSERTION OF INTERCOSTAL CATHETER FOR DRAINAGE", 
                            "86.59 CLOSURE OF SKIN AND SUBCUTANEOUS TISSUE OTHER SITES", 
                            "87.03 COMPUTERIZED AXIAL TOMOGRAPHY OF HEAD", "88.01 COMPUTERIZED AXIAL TOMOGRAPHY OF ABDOMEN", 
                            "87.41 COMPUTERIZED AXIAL TOMOGRAPHY OF THORAX", NA, "96.04 INSERTION OF ENDOTRACHEAL TUBE", 
                            "57.94 INSERTION OF INDWELLING URINARY CATHETER", "99.29 INJECTION OR INFUSION OF OTHER THERAPEUTIC OR PROPHYLACTIC SUBSTANCE", 
                            "38.02 INCISION OF OTHER VESSELS OF HEAD AND NECK"), `Interventions RH` = c("xray", 
                                                                                                        "CT Head", NA, NA, NA, NA, "CT Chest - Referring Hospital", "Chest Tube Placement", "Ct Head", 
                                                                                                        NA)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
                                                                                                        ))

如您所见,有很多大部分是空行。主题名称下方的所有大部分为空的行都与该主题相关联。我想将我的数据集重塑为每个主题一行,我需要一些帮助。

为此,我查看了此处的其他答案,并询问了我的朋友,并且一直看到“收集然后传播”的答案,但是由于一些特定原因,这种特殊情况对我构成了挑战:

  1. 我通常键入的“全名”列中的行是空的。即我不能告诉 R 收集所有 Jane Smith 行,因为它们中没有她的名字。
  2. 我想展开几列。我想从“执行的程序”列中创建多个列,即程序 1、程序 2 等。我想从“干预 Rih”列中创建多个列。
  3. 在分析过程中,我可能最终会在数据中搜索特定单词或短语的文本,因此,如果编码更容易将特定患者的每个程序压缩到该患者的“执行的程序”列中(在一行),我也同意。

因此,对于预期的输出,我可以: 在此处输入图像描述

或者: 在此处输入图像描述

谢谢您的帮助!

标签: rtidyr

解决方案


您也可以使用data.table聚合和zoo填充NA值来执行此操作。我已更改您的列名以使代码更具可读性。

library(data.table)
library(zoo)
setDT(sample2)
names(sample2) <- c("Name", "Age", "Gender", "Procedure", "Interventions")
sample2[, Name := na.locf(Name)] 


newSample = sample2[,.(
  Age = first(Age),
  Gender = first(Gender),
  aggProcedure = paste(Procedure[!is.na(Procedure)],collapse=","),
  aggInterventions = paste(Interventions[!is.na(Interventions)],collapse=",")), 

  by= Name]

推荐阅读