首页 > 解决方案 > 使用列表值拆分列并收集到 r

问题描述

我有这个df:

>df
    author author_id other_authors other_authors_id
        A      123       D, E ,F       011 , 021, 003
        B      122       G             111
        C      121       H, F          101, 003

最后两列的值存储为list. 我想让它从宽到长,但我不确定最好的方法是什么。我正在尝试从中创建网络图。

我想收集它们,使它们看起来像这样:

author other_autors author_id other_autors_id
A      D            123       011
A      E            123       021
A      F            123       003
B      G            122       111
C      H            121       101
C      F            121       003

任何想法怎么做?我已经设法做到这一点,但它只有在值不是时才有效lists

gather(df, key="author", value="other_authors", -author)

标签: r

解决方案


我们可以使用cSplitfromsplitstackshape进行多列拆分

library(splitstackshape)
cSplit(df, c("other_authors", "other_authors_id"), ", ", "long",
       fixed = FALSE, type.convert = FALSE)
#    author author_id other_authors other_authors_id
#1:      A       123             D              011
#2:      A       123             E              021
#3:      A       123             F              003
#4:      B       122             G              111
#5:      C       121             H              101
#6:      C       121             F              003

或使用separate_rowsfromtidyr

library(tidyverse)
df %>%
   separate_rows(other_authors, other_authors_id)
#   author author_id other_authors other_authors_id
#1      A       123             D              011
#2      A       123             E              021
#3      A       123             F              003
#4      B       122             G              111
#5      C       121             H              101
#6      C       121             F              003

更新

如果列 'other_authors', 'others_authors_id' 是list列,那么我们可以使用unnest

df1 %>%
       unnest
#  author author_id other_authors other_authors_id
#1      A       123             D              011
#2      A       123             E              021
#3      A       123             F              003
#4      B       122             G              111
#5      C       121             H              101
#6      C       121             F              003

数据

df <- structure(list(author = c("A", "B", "C"), author_id = 123:121, 
other_authors = c("D, E ,F", "G", "H, F"), other_authors_id = c("011 , 021, 003", 
"111", "101, 003")), class = "data.frame", row.names = c(NA, 
 -3L))

df1 <- structure(list(author = c("A", "B", "C"), author_id = 123:121, 
other_authors = list(c("D", "E", "F"), "G", c("H", "F")), 
other_authors_id = list(c("011", "021", "003"), "111", c("101", 
"003"))), row.names = c(NA, -3L), class = "data.frame")

推荐阅读