首页 > 解决方案 > 用分隔符“:”和列号分隔列

问题描述

我有以下输入格式的巨大数据框。我正在尝试根据分隔符“:”分隔列,并将值与第一列中的列号和行值一起输出。

input <- structure(list(V1 = structure(1:2, .Label = c("a1", "a2"), class = "factor"), 
    V2 = structure(1:2, .Label = c("aaa-1-c:bbb-1-d:ccc:a", "www-1-c"
    ), class = "factor"), V3 = structure(1:2, .Label = c("cc:nnn:ttt-cc", 
    "cdd:aaa:pp"), class = "factor"), V4 = structure(c(1L, NA
    ), .Label = "aaa-1-d", class = "factor")), class = "data.frame", row.names = c(NA, 
-2L))

我试过了,但列号和值的顺序不正确。

output <- structure(list(V1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
2L, 2L, 2L, 2L), .Label = c("a1", "a2 "), class = "factor"), 
    V2 = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 1L, 1L, 1L, 1L), V3 = structure(c(3L, 
    5L, 7L, 1L, 6L, 9L, 11L, 4L, 12L, 8L, 2L, 10L), .Label = c("a", 
    "aaa", "aaa-1-c", "aaa-1-d", "bbb-1-d", "cc", "ccc", "cdd", 
    "nnn", "pp", "ttt-cc", "www-1-c"), class = "factor")), class = "data.frame", row.names = c(NA, 
-12L))

任何人都可以请帮忙。谢谢!

标签: r

解决方案


这是一个选项,我们将数据集从“宽”重塑为“长”(pivot_longertidyr -1.0.0 开始),然后:在整数与match

library(dplyr)
library(tidyr)
input %>%
   pivot_longer(cols = -V1, names_to = "V2", values_to = "V3", 
          values_drop_na = TRUE) %>% 
   # older versions use gather
   # gather(V2, V3, -V1, na.rm = TRUE) %>%
   separate_rows(V3, sep=":") %>%
   group_by(V1) %>%
   mutate(V2 = match(V2, unique(V2))) %>%
   ungroup
# A tibble: 12 x 3
#   V1       V2 V3     
#   <fct> <int> <chr>  
# 1 a1        1 aaa-1-c
# 2 a1        1 bbb-1-d
# 3 a1        1 ccc    
# 4 a1        1 a      
# 5 a1        2 cc     
# 6 a1        2 nnn    
# 7 a1        2 ttt-cc 
# 8 a1        3 aaa-1-d
# 9 a2        1 www-1-c
#10 a2        2 cdd    
#11 a2        2 aaa    
#12 a2        2 pp     

推荐阅读