首页 > 解决方案 > 用二进制数据获取r中不同组合的频率

问题描述

我有一个包含二进制数据的表,如下所示:

middle-circle   triangles-inside    straight-rays   split-rays  triangle-rays   grouped-rays   sep-lines    
1                       0                  0            0            1                0            1
0                       1                  0            1            0                0            0
0                       0                  0            0            0                0            0
0                       1                  0            1            0                0            0
0                       1                  0            1            0                0            0
0                       0                  0            0            0                0            0 
0                       0                  1            0            0                0            0

我想知道不同组合出现的频率。我在stackoverflow上阅读了同样的问题,并将以下代码应用于我的数据:

library(gtools)
# get all vars present in each row
present <- lapply(seq(nrow(det)), function(i) names(which(det[i,] == 1)))
# get all pairs
all.pairs <- gtools::combinations(n = ncol(det), r = 2, colnames(det))
# count times pairs appear
count <- apply(all.pairs, 1, function(x){
  there <- lapply(x, function(y) sapply(present, `%in%`, x = y))
  sum(Reduce(`&`, there))
})

cbind(all.pairs, count)

我得到以下结果:

                                            count
 [1,] "grouped_rays"     "middle_circle"    "0"  
 [2,] "grouped_rays"     "separation_lines" "0"  
 [3,] "grouped_rays"     "split _rays"      "0"  
 [4,] "grouped_rays"     "straight_rays"    "0"  
 [5,] "grouped_rays"     "triangle_rays"    "0"  
 [6,] "grouped_rays"     "triangles_inside" "0"  
 [7,] "middle_circle"    "separation_lines" "0"  
 [8,] "middle_circle"    "split _rays"      "0"  
 [9,] "middle_circle"    "straight_rays"    "0"  
[10,] "middle_circle"    "triangle_rays"    "0"  
[11,] "middle_circle"    "triangles_inside" "0"  
[12,] "separation_lines" "split _rays"      "0"  
[13,] "separation_lines" "straight_rays"    "0"  
[14,] "separation_lines" "triangle_rays"    "0"  
[15,] "separation_lines" "triangles_inside" "0"  
[16,] "split _rays"      "straight_rays"    "0"  
[17,] "split _rays"      "triangle_rays"    "0"  
[18,] "split _rays"      "triangles_inside" "0"  
[19,] "straight_rays"    "triangle_rays"    "0"  
[20,] "straight_rays"    "triangles_inside" "0"  
[21,] "triangle_rays"    "triangles_inside" "0"

我的问题:是否有可能不仅得到成对的组合,而且得到所有的组合?为什么总是说“count 0”?我正在尝试获取与上述列表类似的列表,其中包含所有可能的组合以及它们发生的频率。它应该如下所示:

                                                                   count
 [1,] "grouped_rays"     "middle_circle"     "sep-lines             "2"  
 [2,] "grouped_rays"     "separation_lines"  "triangles inside"     "0"  
 [3,] "grouped_rays"     "split _rays"                              "1"  

当然,还有所有其他可能的组合。这只是一个例子。

标签: rcountcombinations

解决方案


也许这会产生预期的结果?

tt <- do.call(rbind, apply(x==1, 1, function(y) {
  z <- names(y[y])
  if(length(z) > 1) t(combn(z, 2))}))
table(apply(tt, 1, function(y) paste(sort(y), collapse = " ")))
#   middle.circle sep.lines middle.circle triangle.rays 
#                          1                           1 
#    sep.lines triangle.rays split.rays triangles.inside 
#                          1                           3 

推荐阅读