r - 如何通过迭代组合存储在数据框中的变量名称来构建两个字符串的列表,以便每一列都有一个
问题描述
我有一组带有子变量的任务,我已将其名称放入数据框中:
A tibble: 8 x 3
AB PRP DUAL
<chr> <chr> <chr>
1 Combined_t2|t1_lag8 mean_RT_200_all dual_average_accuracy
2 Combined_abmag_t2_lag8_minus_lag3 mean_RT_1000_all mean_RT_dual
3 Combined_abmag_t2_1.0_minus_lag3 PRP Dual_cost
4 Combined_abwidth NA NA
5 Combined_abdepth NA NA
6 Combined_lag3vslag8_residuals NA NA
7 Combined_lag3vslag8_stdrdized_residuals NA NA
我想将这些变量名称(每列一个,一次两个)组合成一个列表。所以它看起来像:
"Combined_t2|t1_lag8" "mean_RT_200_all"
"Combined_t2|t1_lag8" "mean_RT_1000_all"
"Combined_t2|t1_lag8" "PRP"
...
"Combined_abwidth" "dual_average_accuracy"
"Combined_abwidth" "mean_RT_dual"
...
"PRP" "Dual_cost"
"PRP" "dual_average_accuracy"
...
我已经尝试过“combn”功能,但这似乎只适用于列表,而不适用于数据框。还尝试了一些“for”循环,但没有成功。
解决方案
这是一个base
通用的解决方案(扩展到更多列)。调用您的数据dd
:
# Omit missing values - data isn't really rectangular
# and rows seem to have no meaning
# so a list is an appropriate structure
dd_list = lapply(dd, na.omit)
# generate all pairs of "columns" (now list items)
col_pairs = combn(seq_along(dd_list), 2)
# for each pair, use `expand.grid` to generate all combinations
# since the wanted result is a list of vectors, not a data frame
# we strip the names and convert to matrix
result = apply(col_pairs, MARGIN = 2, FUN = function(x) {
as.matrix(unname(do.call(expand.grid, args = dd_list[x])))
})
# bind the matrices together - this seems like a nice result to work with
result = do.call(rbind, result)
result
# [,1] [,2]
# [1,] "Combined_t2|t1_lag8" "mean_RT_200_all"
# [2,] "Combined_abmag_t2_lag8_minus_lag3" "mean_RT_200_all"
# [3,] "Combined_abmag_t2_1.0_minus_lag3" "mean_RT_200_all"
# [4,] "Combined_abwidth" "mean_RT_200_all"
# [5,] "Combined_abdepth" "mean_RT_200_all"
# [6,] "Combined_lag3vslag8_residuals" "mean_RT_200_all"
# [7,] "Combined_lag3vslag8_stdrdized_residuals" "mean_RT_200_all"
# [8,] "Combined_t2|t1_lag8" "mean_RT_1000_all"
# ...
# but if you really want a list we can `split` the matrix into
# individual rows:
split(result, 1:nrow(result))
# $`1`
# [1] "Combined_t2|t1_lag8" "mean_RT_200_all"
#
# $`2`
# [1] "Combined_abmag_t2_lag8_minus_lag3" "mean_RT_200_all"
#
# $`3`
# [1] "Combined_abmag_t2_1.0_minus_lag3" "mean_RT_200_all"
# ...
上面是花哨的(和可扩展的)方法 - 它几乎等同于这种快速而肮脏的方法:
result = rbind(
expand.grid(x = na.omit(dd$AB), y = na.omit(dd$PRP)),
expand.grid(x = na.omit(dd$AB), y = na.omit(dd$DUAL)),
expand.grid(x = na.omit(dd$PRP), y = na.omit(dd$DUAL))
)
split(as.matrix(unname(result)), 1:nrow(result))
推荐阅读
- html - 在 Jquery 制作的 Snake 游戏中计算分数
- python - sympy 是否有内置方法来设置多个多项式彼此相等?
- scala - 为什么在推断类型构造函数时不使用所有类型边界?
- linux - 为 Linux 内核构建完整的控制流图
- google-sheets - 雅虎财经获得增长率
- javascript - 如何在不丢失编译时类型安全的情况下将 [一些嵌套输入对象] 转换为 [Generic 1] 到 [Generic 2]?
- node.js - 使用 firebase 托管的网页仅返回内部服务器错误
- c++ - 为什么返回指向没有 malloc 初始化的结构的指针在 C++ 中不会失败
- machine-learning - 当我用 tensorflow 2.0 训练 VGG 时,为什么我的准确率没有提高,只保持在 25% 左右
- android - 如何从 Visual Studio 代码生成颤振 apk 以上传到 Playstore?