首页 > 解决方案 > 转换具有 2 列和行分隔符的 R 数据框

问题描述

我有一个数据框,它有两列“id”和“detail”(下面的df_current)。我需要按id对dataframe进行分组,并将文件展开,使列变为“Interface1”、“Interface2”等,并且接口列下的内容是每次接口值出现时的立即值。本质上是“!” 用作分隔符,但在输出中不需要它。

所需的输出如下所示:“df_needed_from_current”。

我尝试了多种方法(group_by、spread、reshape、dcast 等),但无法让它发挥作用。任何帮助将不胜感激!

示例当前数据框(要在下面创建的代码):

ID 细节
1
1 接口1
1 一种
1 b
1
1 接口2
1 一种
1 b
2
2 接口1
2 一种
2 b
2 C
2
2 接口2
2 一种
3
3 接口1
3 一种
3 b
3 C
3 d
df_current <- data.frame(
        id = c("1","1","1","1","1","1","1","1","2",
               "2","2","2","2","2","2","2","3","3",
               "3","3","3","3","4","4","4","4","4",
               "4","4","4","4","4","4","4","4","4",
               "5","5","5","5","5","5","5","5","5",
               "5","5","5","5"),
        detail = c("!", "Interface1","a","b","!",
                   "Interface2","a","b","!","Interface1",
                   "a","b","c","!","Interface2","a",
                   "!", "Interface1","a","b","c","d",
                   "!", "Interface1","a","b","!",
                   "Interface2","a","b","c","!","Interface3",
                   "a","b","c","!","Interface1","a","b","!",
                   "Interface2","a","b","c","!","Interface3",
                   "a","b"))

需要数据框(创建代码):

ID 接口1 接口2 接口3
1 一种 一种 不适用
1 b b 不适用
2 一种 一种 不适用
2 b 不适用 不适用
2 C 不适用 不适用
3 一种 不适用 不适用
3 b 不适用 不适用
3 C 不适用 不适用
3 d 不适用 不适用
df_needed_from_current <- data.frame(
        id = c("1","1","2","2","2","3","3","3","3","4","4","4","5","5","5"),
        Interface1 = c("a","b","a","b","c","a","b","c","d","a","b","NA","a","b","NA"),
        Interface2 = c("a","b","a","NA","NA","NA","NA","NA","NA","a","b","c","a","b","c"),
        Interface3 = c("NA","NA","NA","NA","NA","NA","NA","NA","NA","a","b","c","a","b","NA")
        )

标签: rdataframegroup-bysplittransformation

解决方案


我们删除 'detail' 值所在的行"!",然后创建一个新列 'interface',其中仅包含来自 'detail' 的前缀为 'Interface' 的值,使用fillfrom用前一个非 NAtidyr填充元素,其中的行'detail' 值与 'interface' 列不同,使用(from ) 创建行序列 id 并使用重塑为 'wide' 格式NAfilterrowiddata.tablepivot_wider

library(dplyr)
library(tidyr)
library(data.table)
library(stringr)
df_current %>%
   filter(detail != "!") %>%
   mutate(interface = case_when(str_detect(detail, 'Interface') ~ detail)) %>%
   group_by(id) %>%
   fill(interface) %>%
   ungroup %>%
   filter(detail != interface) %>% 
   mutate(rn = rowid(id, interface)) %>% 
   pivot_wider(names_from = interface, values_from = detail) %>%
   select(-rn)
# A tibble: 15 x 4
#   id    Interface1 Interface2 Interface3
#   <chr> <chr>      <chr>      <chr>     
# 1 1     a          a          <NA>      
# 2 1     b          b          <NA>      
# 3 2     a          a          <NA>      
# 4 2     b          <NA>       <NA>      
# 5 2     c          <NA>       <NA>      
# 6 3     a          <NA>       <NA>      
# 7 3     b          <NA>       <NA>      
# 8 3     c          <NA>       <NA>      
# 9 3     d          <NA>       <NA>      
#10 4     a          a          a         
#11 4     b          b          b         
#12 4     <NA>       c          c         
#13 5     a          a          a         
#14 5     b          b          b         
#15 5     <NA>       c          <NA>    

推荐阅读