首页 > 解决方案 > 如何通过迭代组合存储在数据框中的变量名称来构建两个字符串的列表,以便每一列都有一个

问题描述

我有一组带有子变量的任务,我已将其名称放入数据框中:

A tibble: 8 x 3
  AB                                            PRP              DUAL                   
  <chr>                                         <chr>            <chr>                  
1  Combined_t2|t1_lag8                          mean_RT_200_all   dual_average_accuracy
2  Combined_abmag_t2_lag8_minus_lag3            mean_RT_1000_all  mean_RT_dual     
3  Combined_abmag_t2_1.0_minus_lag3             PRP               Dual_cost            
4  Combined_abwidth                             NA                NA                    
5  Combined_abdepth                             NA                NA                    
6  Combined_lag3vslag8_residuals                NA                NA                    
7  Combined_lag3vslag8_stdrdized_residuals      NA                NA                    

我想将这些变量名称(每列一个,一次两个)组合成一个列表。所以它看起来像:

"Combined_t2|t1_lag8" "mean_RT_200_all"
"Combined_t2|t1_lag8" "mean_RT_1000_all"
"Combined_t2|t1_lag8" "PRP"
...
"Combined_abwidth" "dual_average_accuracy"
"Combined_abwidth" "mean_RT_dual"
...
"PRP" "Dual_cost"
"PRP" "dual_average_accuracy"
...

我已经尝试过“combn”功能,但这似乎只适用于列表,而不适用于数据框。还尝试了一些“for”循环,但没有成功。

标签: r

解决方案


这是一个base通用的解决方案(扩展到更多列)。调用您的数据dd

# Omit missing values - data isn't really rectangular
# and rows seem to have no meaning
# so a list is an appropriate structure
dd_list = lapply(dd, na.omit)

# generate all pairs of "columns" (now list items)
col_pairs = combn(seq_along(dd_list), 2)

# for each pair, use `expand.grid` to generate all combinations
# since the wanted result is a list of vectors, not a data frame
# we strip the names and convert to matrix
result = apply(col_pairs, MARGIN = 2, FUN = function(x) {
  as.matrix(unname(do.call(expand.grid, args = dd_list[x])))
})

# bind the matrices together - this seems like a nice result to work with
result = do.call(rbind, result)
result
 #      [,1]                                      [,2]                   
 # [1,] "Combined_t2|t1_lag8"                     "mean_RT_200_all"      
 # [2,] "Combined_abmag_t2_lag8_minus_lag3"       "mean_RT_200_all"      
 # [3,] "Combined_abmag_t2_1.0_minus_lag3"        "mean_RT_200_all"      
 # [4,] "Combined_abwidth"                        "mean_RT_200_all"      
 # [5,] "Combined_abdepth"                        "mean_RT_200_all"      
 # [6,] "Combined_lag3vslag8_residuals"           "mean_RT_200_all"      
 # [7,] "Combined_lag3vslag8_stdrdized_residuals" "mean_RT_200_all"      
 # [8,] "Combined_t2|t1_lag8"                     "mean_RT_1000_all"   
 # ...

  
# but if you really want a list we can `split` the matrix into 
# individual rows:
split(result, 1:nrow(result))
# $`1`
# [1] "Combined_t2|t1_lag8" "mean_RT_200_all"    
# 
# $`2`
# [1] "Combined_abmag_t2_lag8_minus_lag3" "mean_RT_200_all"                  
# 
# $`3`
# [1] "Combined_abmag_t2_1.0_minus_lag3" "mean_RT_200_all"                 
# ...

上面是花哨的(和可扩展的)方法 - 它几乎等同于这种快速而肮脏的方法:

result = rbind(
  expand.grid(x = na.omit(dd$AB), y = na.omit(dd$PRP)),
  expand.grid(x = na.omit(dd$AB), y = na.omit(dd$DUAL)),
  expand.grid(x = na.omit(dd$PRP), y = na.omit(dd$DUAL))
)

split(as.matrix(unname(result)), 1:nrow(result))

推荐阅读