首页 > 解决方案 > R将多列分成列表

问题描述

我尝试编写一个函数将列分隔到每个数据框中,同时将前四列和每个样本保留在数据框中。以下是示例:

df:
Name    RsID    Chr Position    Sample1 Sample2 Sample3 Sample4 Sample5 Sample6 Sample7
200610-1    rs423874    MT  2755    AA  AA  AA  AA  AA  AA  AA
200610-10   rs94753345  MT  0   AA  AA  AA  AA  AA  AA  AA
200610-100  rs36757 MT  15172   GG  GG  GG  GG  GG  GG  GG
200610-102  rs1444029   MT  125 AA  AA  AA  AA  AA  AA  AA
200610-105  rs3796687   MT  236 AA  AA  TT  AA  AA  AT  AA
200610-107  rs483795    MT  482 TT  AA  AA  TT  AA  AA  AA

desired output:
Name    RsID    Chr Position    Sample1
200610-1    rs423874    MT  2755    AA
200610-10   rs94753345  MT  0   AA
200610-100  rs36757 MT  15172   GG
200610-102  rs1444029   MT  125 AA
200610-105  rs3796687   MT  236 AA
200610-107  rs483795    MT  482 TT

Name    RsID    Chr Position    Sample2
200610-1    rs423874    MT  2755    AA
200610-10   rs94753345  MT  0   AA
200610-100  rs36757 MT  15172   GG
200610-102  rs1444029   MT  125 AA
200610-105  rs3796687   MT  236 AA
200610-107  rs483795    MT  482 AA   

...

code:
sep_col <- function(df,i) {if (length(i) <= 1) { x <- cbind(df[1:4],df[i])} 
else { x <- list()
for(s in 1:length(i)) {y <- cbind(df[1:4],df[i[s]])
  x[[s]] <- list(y)}}
return(x)}

如果我在函数内写 df[1:4] 它可以工作,但是如果我只在函数中改回 df 并运行,则会出现错误:

sep_col(df[1:4],6)

Error:
Error in `[.data.frame`(df, i) : undefined columns selected
Called from: `[.data.frame`(df, i)

我不知道为什么它不正确,但是两个类都是'data.frame',所以任何人都可以提出建议,谢谢。

标签: rlistfunctionfor-loopdataframe

解决方案


我们可以使用Map将 1:4 列分别与第 5 到 11 列中的每一个绑定,names并将相应列的setNames

Map(function(x, y, z) cbind(x, setNames(list(y), z)), 
                   list(df[1:4]), df[5:11], names(df)[5:11])
#[[1]]
#        Name       RsID Chr Position Sample1
#1   200610-1   rs423874  MT     2755      AA
#2  200610-10 rs94753345  MT        0      AA
#3 200610-100    rs36757  MT    15172      GG
#4 200610-102  rs1444029  MT      125      AA
#5 200610-105  rs3796687  MT      236      AA

#[[2]]
#        Name       RsID Chr Position Sample2
#1   200610-1   rs423874  MT     2755      AA
#2  200610-10 rs94753345  MT        0      AA
#3 200610-100    rs36757  MT    15172      GG
#4 200610-102  rs1444029  MT      125      AA
#5 200610-105  rs3796687  MT      236      AA

#[[3]]
#        Name       RsID Chr Position Sample3
#1   200610-1   rs423874  MT     2755      AA
#2  200610-10 rs94753345  MT        0      AA
#3 200610-100    rs36757  MT    15172      GG
#4 200610-102  rs1444029  MT      125      AA
#5 200610-105  rs3796687  MT      236      TT

#[[4]]
#        Name       RsID Chr Position Sample4
#1   200610-1   rs423874  MT     2755      AA
#2  200610-10 rs94753345  MT        0      AA
#3 200610-100    rs36757  MT    15172      GG
#4 200610-102  rs1444029  MT      125      AA
#5 200610-105  rs3796687  MT      236      AA

#[[5]]
#        Name       RsID Chr Position Sample5
#1   200610-1   rs423874  MT     2755      AA
#2  200610-10 rs94753345  MT        0      AA
#3 200610-100    rs36757  MT    15172      GG
#4 200610-102  rs1444029  MT      125      AA
#5 200610-105  rs3796687  MT      236      AA

#[[6]]
#        Name       RsID Chr Position Sample6
#1   200610-1   rs423874  MT     2755      AA
#2  200610-10 rs94753345  MT        0      AA
#3 200610-100    rs36757  MT    15172      GG
#4 200610-102  rs1444029  MT      125      AA
#5 200610-105  rs3796687  MT      236      AT

#[[7]]
#        Name       RsID Chr Position Sample7
#1   200610-1   rs423874  MT     2755      AA
#2  200610-10 rs94753345  MT        0      AA
#3 200610-100    rs36757  MT    15172      GG
#4 200610-102  rs1444029  MT      125      AA
#5 200610-105  rs3796687  MT      236      AA

或使用lapply,遍历列名 5 到 11,使用该列和数据集cbind的前 4 列对数据集进行子集化

lapply(names(df)[5:11], function(x) cbind(df[1:4], df[x]))

推荐阅读