首页 > 解决方案 > R将多列转换为长格式

问题描述

我有一个宽格式的个人级数据框,我正试图将它融入一个长格式的汇总表中。示例代码和示例如下:

set.seed(100)

#Original wide format person level data
dat_wide <- tibble(group = rep(x = c('a','b','c'), each = 5),
                   d1 = sample(x = c(1, 0, NA_integer_), size = 15, replace = TRUE),
                   d2 = sample(x = c(1, 0, NA_integer_), size = 15, replace = TRUE),
                   d3 = sample(x = c(1, 0, NA_integer_), size = 15, replace = TRUE)) %>% 
  mutate(d1_br = case_when(
    d1 == 1 ~ .8,
    d1 == 0 ~ .2,
    TRUE ~ NA_real_
  ),
  d2_br = case_when(
    d2 == 1 ~ .6,
    d2 == 0 ~ .4,
    TRUE ~ NA_real_
  ),
  d3_br = case_when(
    d3 == 1 ~ .95,
    d3 == 0 ~ .05,
    TRUE ~ NA_real_
  ))
dat_wide

# A tibble: 15 x 7
   group    d1    d2    d3 d1_br d2_br d3_br
   <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1 a         0    NA     0   0.2  NA    0.05
 2 a         1     0     0   0.8   0.4  0.05
 3 a         1    NA     1   0.8  NA    0.95
 4 a         1    NA    NA   0.8  NA   NA   
 5 a         0    NA     1   0.2  NA    0.95
 6 b        NA     0     1  NA     0.4  0.95
 7 b        NA    NA     1  NA    NA    0.95
 8 b        NA     0     1  NA     0.4  0.95
 9 b         1     0     0   0.8   0.4  0.05
10 b         1     1    NA   0.8   0.6 NA   
11 c         0    NA     1   0.2  NA    0.95
12 c         0    NA     0   0.2  NA    0.05
13 c        NA    NA     1  NA    NA    0.95
14 c         1    NA     0   0.8  NA    0.05
15 c         0     1     0   0.2   0.6  0.05

#Long format summary table
dat_long <- tibble(group = rep(c('a','b','b'), each = 9),
                   dv = rep(c('d1','d2','d3'), each = 3, times = 3),
                   response = rep(c(1, 0, NA_integer_), times = 9)) %>% 
  mutate(br = case_when(
    dv == 'd1' & response == 1 ~ .8,
    dv == 'd1' & response == 0 ~ .2,
    dv == 'd2' & response == 1 ~ .6,
    dv == 'd2' & response == 0 ~ .4,
    dv == 'd3' & response == 1 ~ .95,
    dv == 'd3' & response == 0 ~ .05,
    TRUE ~ NA_real_
  ))

# A tibble: 27 x 4
   group dv    response    br
   <chr> <chr>    <dbl> <dbl>
 1 a     d1           1  0.8 
 2 a     d1           0  0.2 
 3 a     d1          NA NA   
 4 a     d2           1  0.6 
 5 a     d2           0  0.4 
 6 a     d2          NA NA   
 7 a     d3           1  0.95
 8 a     d3           0  0.05
 9 a     d3          NA NA   
10 b     d1           1  0.8 
# ... with 17 more rows

我相当肯定这可以完成,tidyr::pivot_longer尽管我是新手,还没有弄清楚它的所有功能。类似于以下内容的东西应该可以工作,但是有人可以帮助填写 pivot_longer 语法吗?我需要names_sepornames_prefix参数吗?我还没有完全理解这些。

#Possible solution
dat_long <- dat_wide %>% 
  # pivot_longer(...) %>% 
  distinct(group, dv, response, .keep_all = TRUE)

标签: rdplyrtidyversereshapetidyr

解决方案


如果最后列名中的数值一致,这可能会更容易。如果您重命名dat_wide

names(dat_wide) <- sub("(\\w+)(\\d+)(\\w*)", "\\1\\3\\2", names(dat_wide))

d1_br会变成d_br1

然后您可以使用pivot_longer两组列:

library(tidyverse)

dat_wide %>%
  pivot_longer(cols = -group, 
               names_to = c(".value", "dv"),
               names_pattern = "(\\w+)(\\d+)") %>%
  distinct(group, dv, d, .keep_all = TRUE) %>%
  arrange(group, dv)

输出

# A tibble: 24 x 4
   group dv        d  d_br
   <chr> <chr> <dbl> <dbl>
 1 a     1         0  0.2 
 2 a     1        NA NA   
 3 a     1         1  0.8 
 4 a     2        NA NA   
 5 a     2         0  0.4 
 6 a     2         1  0.6 
 7 a     3        NA NA   
 8 a     3         0  0.05
 9 a     3         1  0.95
10 b     1         0  0.2 

推荐阅读