r - R将多列转换为长格式
问题描述
我有一个宽格式的个人级数据框,我正试图将它融入一个长格式的汇总表中。示例代码和示例如下:
set.seed(100)
#Original wide format person level data
dat_wide <- tibble(group = rep(x = c('a','b','c'), each = 5),
d1 = sample(x = c(1, 0, NA_integer_), size = 15, replace = TRUE),
d2 = sample(x = c(1, 0, NA_integer_), size = 15, replace = TRUE),
d3 = sample(x = c(1, 0, NA_integer_), size = 15, replace = TRUE)) %>%
mutate(d1_br = case_when(
d1 == 1 ~ .8,
d1 == 0 ~ .2,
TRUE ~ NA_real_
),
d2_br = case_when(
d2 == 1 ~ .6,
d2 == 0 ~ .4,
TRUE ~ NA_real_
),
d3_br = case_when(
d3 == 1 ~ .95,
d3 == 0 ~ .05,
TRUE ~ NA_real_
))
dat_wide
# A tibble: 15 x 7
group d1 d2 d3 d1_br d2_br d3_br
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 a 0 NA 0 0.2 NA 0.05
2 a 1 0 0 0.8 0.4 0.05
3 a 1 NA 1 0.8 NA 0.95
4 a 1 NA NA 0.8 NA NA
5 a 0 NA 1 0.2 NA 0.95
6 b NA 0 1 NA 0.4 0.95
7 b NA NA 1 NA NA 0.95
8 b NA 0 1 NA 0.4 0.95
9 b 1 0 0 0.8 0.4 0.05
10 b 1 1 NA 0.8 0.6 NA
11 c 0 NA 1 0.2 NA 0.95
12 c 0 NA 0 0.2 NA 0.05
13 c NA NA 1 NA NA 0.95
14 c 1 NA 0 0.8 NA 0.05
15 c 0 1 0 0.2 0.6 0.05
#Long format summary table
dat_long <- tibble(group = rep(c('a','b','b'), each = 9),
dv = rep(c('d1','d2','d3'), each = 3, times = 3),
response = rep(c(1, 0, NA_integer_), times = 9)) %>%
mutate(br = case_when(
dv == 'd1' & response == 1 ~ .8,
dv == 'd1' & response == 0 ~ .2,
dv == 'd2' & response == 1 ~ .6,
dv == 'd2' & response == 0 ~ .4,
dv == 'd3' & response == 1 ~ .95,
dv == 'd3' & response == 0 ~ .05,
TRUE ~ NA_real_
))
# A tibble: 27 x 4
group dv response br
<chr> <chr> <dbl> <dbl>
1 a d1 1 0.8
2 a d1 0 0.2
3 a d1 NA NA
4 a d2 1 0.6
5 a d2 0 0.4
6 a d2 NA NA
7 a d3 1 0.95
8 a d3 0 0.05
9 a d3 NA NA
10 b d1 1 0.8
# ... with 17 more rows
我相当肯定这可以完成,tidyr::pivot_longer
尽管我是新手,还没有弄清楚它的所有功能。类似于以下内容的东西应该可以工作,但是有人可以帮助填写 pivot_longer 语法吗?我需要names_sep
ornames_prefix
参数吗?我还没有完全理解这些。
#Possible solution
dat_long <- dat_wide %>%
# pivot_longer(...) %>%
distinct(group, dv, response, .keep_all = TRUE)
解决方案
如果最后列名中的数值一致,这可能会更容易。如果您重命名dat_wide
:
names(dat_wide) <- sub("(\\w+)(\\d+)(\\w*)", "\\1\\3\\2", names(dat_wide))
d1_br
会变成d_br1
等
然后您可以使用pivot_longer
两组列:
library(tidyverse)
dat_wide %>%
pivot_longer(cols = -group,
names_to = c(".value", "dv"),
names_pattern = "(\\w+)(\\d+)") %>%
distinct(group, dv, d, .keep_all = TRUE) %>%
arrange(group, dv)
输出
# A tibble: 24 x 4
group dv d d_br
<chr> <chr> <dbl> <dbl>
1 a 1 0 0.2
2 a 1 NA NA
3 a 1 1 0.8
4 a 2 NA NA
5 a 2 0 0.4
6 a 2 1 0.6
7 a 3 NA NA
8 a 3 0 0.05
9 a 3 1 0.95
10 b 1 0 0.2
推荐阅读
- postgresql - 如何避免 Postgres 中的静态查询?
- java - Recursively finding words in a hashmap using tries
- r - 通过匹配所有可能的行和列的键来添加任何缺失的行和列
- bash - Printing name of files that match awk script
- ruby - Use PORT environment variable in Rack/Sinatra
- omnet++ - 如何在 INET/Veins_inet 中同时从多台主机在无线网络中发送 UDP 数据包
- python - 为什么 Networkx 和 Igraph 为 Mincut 函数返回不同的值?
- linux - 如何计算 BPF 辅助函数的返回码的含义?
- sql - 找到最高平均工资并打印(即使有倍数),我缺少小数
- r - create new variable based on matches in groups within groups