首页 > 解决方案 > 使用 R 根据存储在当前变量名称(由“_”分隔)中的唯一信息将数据帧转换为长格式

问题描述

我正在寻找一种方法来从现有变量名称中提取信息,创建新变量来存储提取的信息,并根据新定义的变量将数据框转换为长格式。处理心理治疗调查的答复。

使用 R 或 SQL。

现有变量名称中包含的信息:

情节 = 个人每次参与节目是一个情节
主题 = 填写调查的个人(可以是参与者、母亲、父亲等)
类型 = 当前调查的名称(注意:一些调查有额外的识别信息,以"_")
实例 = 入院或出院后的天数
描述 = 问题编号或该列特有的其他信息

目前,每条信息用“_”分隔。
这是格式:episode_subject_type_instance_description

## Have data currently in this format, but with almost 5000 variables  

tibble(case_name = c("Joe", "Mary", "Jane"),
       episode1_student_survey1_day0_Q1 = c(1, 2, 3),
       episode1_student_survey1_day0_Q2 = c("A", "B", "C"))
# A tibble: 3 x 3
  case_name episode1_student_survey1_day0_Q1 episode1_student_survey1_day0_Q2
  <chr>                                <dbl> <chr>                           
1 Joe                                      1 A                               
2 Mary                                     2 B                               
3 Jane                                     3 C                               

## Want to transform to long like this:  

tibble(case_name = c("Joe", "Joe", "Mary", "Mary", "Jane", "Jane"),
       episode = "episode1",
       subject = "student",
       type = "survey1",
       instance = "day0",
       description = c("Q1", "Q1", "Q1", "Q2", "Q2", "Q2"),
       value = c(1, 2, 3, "A", "B", "C"))
# A tibble: 6 x 7
  case_name episode  subject type    instance description value
  <chr>     <chr>    <chr>   <chr>   <chr>    <chr>       <chr>
1 Joe       episode1 student survey1 day0     Q1          1    
2 Joe       episode1 student survey1 day0     Q1          2    
3 Mary      episode1 student survey1 day0     Q1          3    
4 Mary      episode1 student survey1 day0     Q2          A    
5 Jane      episode1 student survey1 day0     Q2          B    
6 Jane      episode1 student survey1 day0     Q2          C  

我假设有某种方法可以一次提取每条信息,但不知道如何去做。

感谢您的任何帮助!

标签: sqlrdplyr

解决方案


R中,将列类型转换为“ charactercase_name”以外的类型,然后使用pivot_longerseparate列重塑为“long”格式name

library(dplyr)
library(tidyr)
nm1 <- c("episode", "subject", "type", "instance", "description")
df1 %>%
     mutate(across(-case_name, as.character)) %>%
     pivot_longer(cols = -case_name) %>% 
     separate(name, into = nm1)

-输出

# A tibble: 6 x 7
  case_name episode  subject type    instance description value
  <chr>     <chr>    <chr>   <chr>   <chr>    <chr>       <chr>
1 Joe       episode1 student survey1 day0     Q1          1    
2 Joe       episode1 student survey1 day0     Q2          A    
3 Mary      episode1 student survey1 day0     Q1          2    
4 Mary      episode1 student survey1 day0     Q2          B    
5 Jane      episode1 student survey1 day0     Q1          3    
6 Jane      episode1 student survey1 day0     Q2          C    

数据

df1 <- tibble(case_name = c("Joe", "Mary", "Jane"),
       episode1_student_survey1_day0_Q1 = c(1, 2, 3),
       episode1_student_survey1_day0_Q2 = c("A", "B", "C"))

推荐阅读