首页 > 解决方案 > 如何在 R 中跨多个变量运行表函数并使用函数将结果编译到新数据集中?

问题描述

我有一个包含大约 100 个变量的数据集,我想构建一个包含大约 30 个这些变量的汇总表。为此,我在这些变量上手动运行表和其他函数并 rbinded 结果。但是,由于我需要为 30 多个变量执行此操作,因此我想使用函数自动化该过程。

这是一个示例数据集:


df <- data.frame(v1=c('a','b','c','c','b'),v2=c('d','d','e','e','e'),cat=c('1low','1low','2med','3high','2med'))

目标是创建一个如下所示的表(没有 NA)。 决赛桌示例

下面是我的代码:

library(formattable)

# For var1 & var2, apply the table function and convert to dataframe so that the row labels are incorporated into dataset
var1.df <- as.data.frame(table(df$v1, df$cat))

# reshape to achive wide format (goal to view the count of each var1 level across low, med, high cats)
var1.df <- reshape(var1.df, idvar = "Var1", timevar = "Var2", direction = "wide")

# add col names
names(var1.df) <- c("vcat","low","med","high"); var1.df

# repeat above steps for next variable. in true dataset, I will need to repeat for 30 vars...
var2.df <- as.data.frame(table(df$v2, df$cat))
var2.df <- reshape(var2.df, idvar = "Var1", timevar = "Var2", direction = "wide")
names(var2.df) <- c("vcat","low","med","high")

# Create variable headings
var1.heading <- data.frame("variable 1",NA,NA,NA) # ideally, the NAs are blanks
names(var1.heading) <- c("vcat","low","med","high")

var2.heading <- data.frame("variable 2","","","")
names(var2.heading) <- c("vcat","low","med","high")

# Rbind the category headings and the table result data
table01 <- do.call("rbind", list(var1.heading, var1.df, 
                                 var2.heading, var2.df))

# Format the table for presentation
heading.list <- c("variable 1", "variable 2")
x <- formattable(table01, 
                 align =c("l","c","c","c","c"),
                 list(vcat = formatter("span", style = x ~ ifelse(x %in% heading.list, 
                                                                  style(font.weight = "bold"), NA))))

我以下自动化上述代码的尝试要么不完整(a)要么运行不正常(b)

# (a)
lapply(df, function(x) as.data.frame(table(x, df$cat)))

# (b)
myfxn <- function(x){
  y <- as.data.frame(table(x, df$cat))
  y <- reshape(y, idvar = "x", timevar = "Var2", direction = "wide")
  names(y) <- c("vcat","low","med","high")
}
lapply(df, myfxn(x))

关于如何为更多变量自动执行此过程的任何建议?此外,除了手动创建插入单行数据框之外,还有另一种方法可以在表中插入类别标题吗?请注意,我在 var1.heading 中插入了 NA,因为它是第一个数据帧;当我尝试插入“”而不是空白(如 var2.heading)时,后续数据帧不会绑定,因为它们是因子变量,而不是字符。非常感谢您!

标签: rfunction

解决方案


我将从您的 b 尝试开始,因为它非常接近。我认为你正在重塑的唯一原因是如果你从sdata.frame(table()中删除类“table”你不需要这样做。table

我还会尝试完成函数中一个变量的整个操作,即添加标题、标签等。这样你就可以在一个变量上测试你的函数,以确保它完全符合你的要求,然后开始循环遍历所有变量。

# (b)
myfxn <- function(x, header = 'variable') {
  y <- unclass(table(x, df$cat))
  colnames(y) <- gsub('\\d', '', colnames(y))
  y <- data.frame(vcat = rownames(y), y, stringsAsFactors = FALSE)
  rbind(c(header, rep('', ncol(y) - 1)), y)
}

myfxn(df$v1)
#       vcat low med high
# 1 variable             
# a        a   1   0    0
# b        b   1   1    0
# c        c   0   1    1

接下来,我将使用Mapormapply而不是lapply将多个参数传递给myfxn

l <- Map(myfxn, df[-3], heading.list)

formattable(
  do.call('rbind', l), row.names = FALSE,
  align = c('l', rep('c', nlevels(df$cat))),
  list(
    vcat = formatter('span', style = x ~ ifelse(x %in% heading.list, style(font.weight = 'bold'), NA))
  )
)

在此处输入图像描述

## apply for 30 variables
heading.list <- sprintf('variable %s', 1:30)
l <- Map(myfxn, df[sample(1:2, 30, TRUE)], heading.list)

formattable(
  do.call('rbind', l), row.names = FALSE,
  align = c('l', rep('c', nlevels(df$cat))),
  list(
    vcat = formatter('span', style = x ~ ifelse(x %in% heading.list, style(font.weight = 'bold'), NA))
  )
)

推荐阅读