首页 > 解决方案 > 如何使用 R 中的 for 循环基于列创建数据框的子集

问题描述

我有一个如下所示的数据框:

   id age1 sex1 age2  sex2 age3  sex3 age4  sex4
1    5    20  <NA>    NA   <NA>    NA   <NA>    27 Female
2   25    NA  <NA>    NA   <NA>    NA   <NA>    35 Female
3   65    NA  <NA>    NA   <NA>    NA   <NA>    NA   <NA>

这是数据的代码:

temp <- structure(list(id = c(5L, 25L, 65L, 25L, 65L, 5L, 5L, 85L, 285L, 
541L), age1 = c(20L, NA, NA, NA, NA, NA, NA, NA, NA, NA), sex1 = structure(c(NA_integer_, 
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
NA_integer_, NA_integer_, NA_integer_, NA_integer_), .Label = c("missing", 
"inapplicable", "refusal", "don't know", "inconsistent", "Male", 
"Female"), class = "factor"), age2 = c(NA, NA, NA, NA, 31L, 
NA, NA, NA, NA, NA), sex2 = structure(c(NA, NA, NA, NA, 7L, 
NA, NA, NA, NA, NA), .Label = c("missing", "inapplicable", "refusal", 
"don't know", "inconsistent", "Male", "Female"), class = "factor"), 
    age3 = c(NA, NA, NA, NA, 32L, NA, NA, NA, 25L, 23L), sex3 = structure(c(NA, 
    NA, NA, NA, 7L, NA, NA, NA, 6L, 7L), .Label = c("missing", 
    "inapplicable", "refusal", "don't know", "inconsistent", 
    "Male", "Female"), class = "factor"), age4 = c(27L, 35L, 
    NA, NA, 33L, NA, 24L, NA, 26L, NA), sex4 = structure(c(7L, 
    7L, NA, NA, 7L, NA, 7L, NA, 6L, NA), .Label = c("missing", 
    "inapplicable", "refusal", "don't know", "inconsistent", 
    "Male", "Female"), class = "factor")), row.names = c(NA, 
10L), class = "data.frame")

我想知道如何根据基于列的数据制作多个子集。

我知道我可以通过使用以下代码来做到这一点:

Subset1<- temp[,1:3]
Subset2<-temp[,c(1,4:5)]
Subset3<- temp[,c(1,6:7)]

但是必须有一个更简洁的方法来做到这一点。我尝试了一个 for 循环,但我是 R 新手,不知道如何做到这一点,包括保持新子集的名称一致。

标签: rdataframe

解决方案


我们可以split.default根据列名中的数字来拆分数据,并在每个列表中附加第一列。

new_list <- lapply(split.default(temp[-1], gsub("\\D", "", names(temp)[-1])), 
                   function(x) cbind(temp[1], x))
new_list

#$`1`
#    id age_1 sex_1
#1    5    20  <NA>
#2   25    NA  <NA>
#3   65    NA  <NA>
#4   25    NA  <NA>
#5   65    NA  <NA>
#6    5    NA  <NA>
#7    5    NA  <NA>
#8   85    NA  <NA>
#9  285    NA  <NA>
#10 541    NA  <NA>

#$`2`
#    id age_2  sex_2
#1    5    NA   <NA>
#...

这将返回一个数据框列表,如果您想要单独的数据框中的数据,我们可以这样做:

names(new_list) <- paste0('Subset', seq_along(new_list))
list2env(new_list, .GlobalEnv)

推荐阅读