首页 > 解决方案 > 交叉制表R中的两个二进制列列表

问题描述

我正在开发一个闪亮的应用程序,它允许用户将数据中的不同组相互交叉制表。我的挑战是,我的数据中的一些变量按单列分类(很好且互斥,可以很容易地包含在 group_by 中),而其他变量则由具有一定程度重叠的多个二进制列组成。所以在闪亮的应用程序中,我有“用户”选项,但它指的是“A用户”、“B用户”等列。

现在,将交叉表的行和列作为单列样式组既好又简单(供参考的示例)

data <- data.frame(Gender = sample(x=1:2, size=100, replace = T), Age = sample(x=1:3, size=100, replace = T),
                   A.User = sample(x=0:1, size=100, replace = T), B.User = sample(x=0:1, size=100, replace = T),
                   C.User = sample(x=0:1, size=100, replace = T), D.User = sample(x=0:1, size=100, replace = T),
                   E.User = sample(x=0:1, size=100, replace = T), F.User = sample(x=0:1, size=100, replace = T))

col_vars <- "Gender"
group_var <- "Age"

demog_vals <- data %>% group_by(!!sym(group_var), !!sym(col_vars)) %>%
          summarise(Total = n()) %>% spread(!!sym(col_vars), Total)


将行/列选项之一作为单个选项,将另一个作为列表选项也不错

#function to apply in dplyr summarise_at 
countifx <- function(y, x) {sum(as.numeric(x) == y, na.rm=T) / (length(x) - sum(is.na(x)))}

col_vars <- "Gender"
row_vars <- c("A User", "B User", "C User")

        demog_vals <- data %>% 
          group_by(!!sym(col_vars)) %>% 
          #gsub to change spaces to "." to match dataframe names
          summarise_at(gsub(" ", ".", row_vars, fixed = T), list(Total =~ countifx(1,.))) %>% 
          gather(key, Total, -1) %>% 
          extract(key, c("Users", "measure"), "([A-z0-9\\.]*)_([A-z0-9]*)") %>% 
          spread(!!sym(col_vars), Total) %>% 
          select(-measure)

        demog_vals$Users <- gsub(".", " ", demog_vals$Users, fixed = T)
        demog_vals[, -1] <- sapply(demog_vals[,-1], function(x){paste0(round(x / sum(x) * 100, 1), "%")})

但我真的很难让它适用于行和列都是这些多个二进制列的组合的情况。到目前为止我最远的是

col_vars <- c("A User", "B User", "C User")
row_vars <- c("D User", "E User", "F User")

demog_vals <- data %>% group_by_at(vars((gsub(" ", ".", row_vars, fixed = T)))) %>% 
  summarise_at(gsub(" ", ".", col_vars, fixed = T), list(Total =~ countifx(1,.))) %>% 
  gather(key, Total, c(-1:-6)) %>% 
  extract(key, c("Users", "measure"), "([A-z0-9\\.]*)_([A-z0-9]*)") %>% 
  select(-measure)

但是,这为我提供了所有不同变量对中 1 和 0 的不同级别的每种组合——我只对我分组的各个列感兴趣,而不是交互(即我有 24 行并且想要3,所以(0,0,0),(0,0,1),(0,1,1)等当我只想要(1,0,0),(0,1,0)时, (0,0,1) 但不是我从上面的代码中过滤得到的那些——如果这有任何意义的话......

我可能过于复杂了,但我已经盯着它看了很长时间了,谷歌已经没有想法了。任何帮助将不胜感激

标签: rdplyr

解决方案


通过简化管道并将其包装在 for 循环中,我设法获得了能够提供所需输出的东西

#create data
data <- data.frame(Gender = sample(x=1:2, size=100, replace = T), Age = sample(x=1:3, size=100, replace = T),
                   A.User = sample(x=0:1, size=100, replace = T), B.User = sample(x=0:1, size=100, replace = T),
                   C.User = sample(x=0:1, size=100, replace = T), D.User = sample(x=0:1, size=100, replace = T),
                   E.User = sample(x=0:1, size=100, replace = T), F.User = sample(x=0:1, size=100, replace = T))

#specify lists of columns to use
col_vars <- c("A User", "B User", "C User")
row_vars <- c("D User", "E User", "F User")

#define custom function to use in summarise step
countifx <- function(y, x) {sum(as.numeric(x) == y, na.rm=T) / (length(x) - sum(is.na(x)))}

#create empty dataframe
demog_vals <- data.frame()

#loop for each item in row_vars list
for (i in 1:length(row_vars)){
  #specify row item
  group_var <- row_vars[i]
  #group and summarise data
  out <- data %>% group_by(!!sym(gsub(" ", ".", group_var, fixed = T))) %>% 
  summarise_at(gsub(" ", ".", col_vars, fixed = T), list(Total =~ countifx(1,.))) %>% 
  filter(!!sym((gsub(" ", ".", group_var, fixed = T)) == 1) %>%
  gather(key, Total, -1) %>% 
  extract(key, c("Users", "measure"), "([A-z0-9\\.]*)_([A-z0-9]*)") %>% 
  select(-measure)
  #coerce output to data frame
  out <- as.data.frame(out)

  out[, 1] <- group_var
  colnames(out) <- c("col user", "row user", "Total")

  #if data frame empty then set to out, otherwise append
  if (i == 1){
    demog_vals <- out
  } else {
    demog_vals <- bind_rows(demog_vals, out)
  }

}
#spread the column variables out
demog_vals <- demog_vals %>% spread('Users', Total)

我不确定这是最优雅或最有效的解决方案,但实际上,我将循环的最长列表有 15 个元素,所以现在它应该可以完成这项工作,但我很想知道是否有人有更好的选择


推荐阅读