r - 交叉制表R中的两个二进制列列表
问题描述
我正在开发一个闪亮的应用程序,它允许用户将数据中的不同组相互交叉制表。我的挑战是,我的数据中的一些变量按单列分类(很好且互斥,可以很容易地包含在 group_by 中),而其他变量则由具有一定程度重叠的多个二进制列组成。所以在闪亮的应用程序中,我有“用户”选项,但它指的是“A用户”、“B用户”等列。
现在,将交叉表的行和列作为单列样式组既好又简单(供参考的示例)
data <- data.frame(Gender = sample(x=1:2, size=100, replace = T), Age = sample(x=1:3, size=100, replace = T),
A.User = sample(x=0:1, size=100, replace = T), B.User = sample(x=0:1, size=100, replace = T),
C.User = sample(x=0:1, size=100, replace = T), D.User = sample(x=0:1, size=100, replace = T),
E.User = sample(x=0:1, size=100, replace = T), F.User = sample(x=0:1, size=100, replace = T))
col_vars <- "Gender"
group_var <- "Age"
demog_vals <- data %>% group_by(!!sym(group_var), !!sym(col_vars)) %>%
summarise(Total = n()) %>% spread(!!sym(col_vars), Total)
将行/列选项之一作为单个选项,将另一个作为列表选项也不错
#function to apply in dplyr summarise_at
countifx <- function(y, x) {sum(as.numeric(x) == y, na.rm=T) / (length(x) - sum(is.na(x)))}
col_vars <- "Gender"
row_vars <- c("A User", "B User", "C User")
demog_vals <- data %>%
group_by(!!sym(col_vars)) %>%
#gsub to change spaces to "." to match dataframe names
summarise_at(gsub(" ", ".", row_vars, fixed = T), list(Total =~ countifx(1,.))) %>%
gather(key, Total, -1) %>%
extract(key, c("Users", "measure"), "([A-z0-9\\.]*)_([A-z0-9]*)") %>%
spread(!!sym(col_vars), Total) %>%
select(-measure)
demog_vals$Users <- gsub(".", " ", demog_vals$Users, fixed = T)
demog_vals[, -1] <- sapply(demog_vals[,-1], function(x){paste0(round(x / sum(x) * 100, 1), "%")})
但我真的很难让它适用于行和列都是这些多个二进制列的组合的情况。到目前为止我最远的是
col_vars <- c("A User", "B User", "C User")
row_vars <- c("D User", "E User", "F User")
demog_vals <- data %>% group_by_at(vars((gsub(" ", ".", row_vars, fixed = T)))) %>%
summarise_at(gsub(" ", ".", col_vars, fixed = T), list(Total =~ countifx(1,.))) %>%
gather(key, Total, c(-1:-6)) %>%
extract(key, c("Users", "measure"), "([A-z0-9\\.]*)_([A-z0-9]*)") %>%
select(-measure)
但是,这为我提供了所有不同变量对中 1 和 0 的不同级别的每种组合——我只对我分组的各个列感兴趣,而不是交互(即我有 24 行并且想要3,所以(0,0,0),(0,0,1),(0,1,1)等当我只想要(1,0,0),(0,1,0)时, (0,0,1) 但不是我从上面的代码中过滤得到的那些——如果这有任何意义的话......
我可能过于复杂了,但我已经盯着它看了很长时间了,谷歌已经没有想法了。任何帮助将不胜感激
解决方案
通过简化管道并将其包装在 for 循环中,我设法获得了能够提供所需输出的东西
#create data
data <- data.frame(Gender = sample(x=1:2, size=100, replace = T), Age = sample(x=1:3, size=100, replace = T),
A.User = sample(x=0:1, size=100, replace = T), B.User = sample(x=0:1, size=100, replace = T),
C.User = sample(x=0:1, size=100, replace = T), D.User = sample(x=0:1, size=100, replace = T),
E.User = sample(x=0:1, size=100, replace = T), F.User = sample(x=0:1, size=100, replace = T))
#specify lists of columns to use
col_vars <- c("A User", "B User", "C User")
row_vars <- c("D User", "E User", "F User")
#define custom function to use in summarise step
countifx <- function(y, x) {sum(as.numeric(x) == y, na.rm=T) / (length(x) - sum(is.na(x)))}
#create empty dataframe
demog_vals <- data.frame()
#loop for each item in row_vars list
for (i in 1:length(row_vars)){
#specify row item
group_var <- row_vars[i]
#group and summarise data
out <- data %>% group_by(!!sym(gsub(" ", ".", group_var, fixed = T))) %>%
summarise_at(gsub(" ", ".", col_vars, fixed = T), list(Total =~ countifx(1,.))) %>%
filter(!!sym((gsub(" ", ".", group_var, fixed = T)) == 1) %>%
gather(key, Total, -1) %>%
extract(key, c("Users", "measure"), "([A-z0-9\\.]*)_([A-z0-9]*)") %>%
select(-measure)
#coerce output to data frame
out <- as.data.frame(out)
out[, 1] <- group_var
colnames(out) <- c("col user", "row user", "Total")
#if data frame empty then set to out, otherwise append
if (i == 1){
demog_vals <- out
} else {
demog_vals <- bind_rows(demog_vals, out)
}
}
#spread the column variables out
demog_vals <- demog_vals %>% spread('Users', Total)
我不确定这是最优雅或最有效的解决方案,但实际上,我将循环的最长列表有 15 个元素,所以现在它应该可以完成这项工作,但我很想知道是否有人有更好的选择
推荐阅读
- android - 有没有一个很好的 Kotlin 示例如何将计时器值作为字符串传递给参数?
- python - 一起启动多个任务时芹菜状态的问题
- python - Python 调用 robocopy 从单个源复制到多个目标
- ionic-framework - 如何在模板中添加 VS 代码片段
- c++ - C++:使用 set 方法时程序崩溃
- react-native - RequireNativeComponent: "SRSRadialGradient" was not found in the UIManager
- android - 使用 PreferenceDataStore 时 PreferenceScreen/PreferenceFragment 的初始值
- sql - Query on two tables with overlapping dates
- python - Values in if-else
- c++ - 如何正确使用 MAXIMUM_ALLOWED?