首页 > 解决方案 > 创建一个包含缺少数据的计数和百分比的表

问题描述

我正在尝试创建一个具有预设尺寸的表格,并让 R 填写计数和百分比。这是针对 R-markdown 报告的。

这是我的示例数据的代码。

#This is the most realistic data I could produce.
Maj <- rep("Major A", times=50)
set.seed(24601) 
Race <- sample(c("Asian","Black", "Am Indian","Hawiian" ,"Hispanic","White","Two or More Races","Not Reported"),
                 prob=c(.01,.1,.01,.01,.02,.80,.05,.01),size=50, replace = T)
Sex <- sample(c("Female","Male"), prob=c(.98,.02),size=50,replace=T)

Enroll_MajorA <- cbind(Maj,Sex,Race)

我需要该表来计算数据集中是否存在给定的种族和性别组合的计数和百分比。这是我需要的样子。

表格格式

我尝试单独计算表的每个值,R-markdown 给了我一个“内存错误”。我已经尝试创建一个计数和百分比表并将它们组合在一起,但它并没有提供报告所需的所有种族/性别组合。我不知道从这里去哪里。请帮忙!

标签: rdatatablecountpercentagekableextra

解决方案


您可以使用aggregate. 您可以保持矩阵不变,因为您可以使用as.data.frame,它会自动强制转换为可数因子。(NROW大写)不区分矩阵和向量。

m.agg <- do.call(data.frame, 
                 aggregate(. ~ Sex + Race, as.data.frame(Enroll_MajorA), function(x) 
                   c(count=as.integer(NROW(x)), share=NROW(x) / NROW(Enroll_MajorA))))

为了得到完整的集合,我们可以合并一个expand.grid,我们可能想要清理一下。

res <- merge(as.data.frame(m.agg), expand.grid(Sex=c("Female", "Male"), 
                                               Race=relevant.races), all=TRUE)  # `relevant.races` below
res[, 3:4][is.na(res[, 3:4])] <- 0  # transform `NA` into 0 to be nice
res[order(res[, "Race"]), ]  # order output
#       Sex              Race Maj.count Maj.share
# 1  Female             Black         2      0.04
# 10   Male             Black         0      0.00
# 2  Female           Hawiian         1      0.02
# 3  Female          Hispanic         1      0.02
# 11   Male          Hispanic         0      0.00
# 4  Female Two or More Races         2      0.04
# 12   Male Two or More Races         0      0.00
# 5  Female             White        44      0.88
# 13   Male             White         0      0.00
# 6  Female             Asian         0      0.00
# 14   Male             Asian         0      0.00
# 7  Female        Am. Indian         0      0.00
# 15   Male        Am. Indian         0      0.00
# 8  Female          Hawaiian         0      0.00
# 16   Male          Hawaiian         0      0.00
# 9  Female      Not Reported         0      0.00
# 17   Male      Not Reported         0      0.00

数据

relevant.races <- c("Asian","Black", "Am. Indian", "Hawaiian" , "Hispanic", "White", 
                    "Two or More Races", "Not Reported")

Enroll_MajorA <- structure(c("Major A", "Major A", "Major A", "Major A", "Major A", 
"Major A", "Major A", "Major A", "Major A", "Major A", "Major A", 
"Major A", "Major A", "Major A", "Major A", "Major A", "Major A", 
"Major A", "Major A", "Major A", "Major A", "Major A", "Major A", 
"Major A", "Major A", "Major A", "Major A", "Major A", "Major A", 
"Major A", "Major A", "Major A", "Major A", "Major A", "Major A", 
"Major A", "Major A", "Major A", "Major A", "Major A", "Major A", 
"Major A", "Major A", "Major A", "Major A", "Major A", "Major A", 
"Major A", "Major A", "Major A", "Female", "Female", "Female", 
"Female", "Female", "Female", "Female", "Female", "Female", "Female", 
"Female", "Female", "Female", "Female", "Female", "Female", "Female", 
"Female", "Female", "Female", "Female", "Female", "Female", "Female", 
"Female", "Female", "Female", "Female", "Female", "Female", "Female", 
"Female", "Female", "Female", "Female", "Female", "Female", "Female", 
"Female", "Female", "Female", "Female", "Female", "Female", "Female", 
"Female", "Female", "Female", "Female", "Female", "White", "White", 
"White", "Hawiian", "White", "White", "White", "White", "White", 
"White", "White", "White", "White", "Two or More Races", "White", 
"White", "White", "White", "White", "White", "White", "Hispanic", 
"White", "White", "White", "White", "White", "White", "Two or More Races", 
"White", "White", "White", "White", "White", "White", "White", 
"White", "Black", "White", "White", "Black", "White", "White", 
"White", "White", "White", "White", "White", "White", "White"
), .Dim = c(50L, 3L), .Dimnames = list(NULL, c("Maj", "Sex", 
"Race")))

推荐阅读