r - Automate statistical analyses using apply/apply-like function
问题描述
I have a dataframe that contains 5 columns, each corresponding to a survey item, and a grouping variable. There are a total of 300 observations in my dataframe and each cell entry represents the response given by a student on a given item. I constructed the following reproducible dataframe:
set.seed(14)
Group <- rep(c(0, 1), each = 150)
mydf <- data.frame(replicate(5, sample(0:1,300,rep=TRUE)))
mydf$Group <- Group
mydf$Group <- factor(mydf$Group, levels = c(0, 1), labels = c("Group A", "Group B"))
head(mydf); tail(mydf)
> head(mydf); tail(mydf)
X1 X2 X3 X4 X5 Group
1 0 1 1 1 0 Group A
2 1 1 1 1 1 Group A
3 1 1 0 1 0 Group A
4 1 0 0 1 1 Group A
5 1 1 0 1 0 Group A
6 1 0 1 1 1 Group A
X1 X2 X3 X4 X5 Group
295 0 1 1 0 1 Group B
296 0 0 1 0 0 Group B
297 1 1 0 1 0 Group B
298 1 1 0 0 1 Group B
299 0 0 1 0 0 Group B
300 1 1 1 1 1 Group B
What I would like to do is perform chi-square test of independence on each survey item
X1
to X5
. So far I have been doing the following [for item 1 (X1
)]:
mydf$X1 <- factor(mydf$X1, levels = c(0, 1), labels = c("AGREE", "DISAGREE"))
MyTable <- table(mydf$Group, mydf$X1)
addmargins(MyTable)
chisq.test(MyTable, correct = FALSE)
and I would like to use the lapply
function (or something similar) to automate this process so that I do not have to repeat the preceding code for each of the 5 items. This is particualry important because I have a similar dataframe that contains 50 items and I would like to use the same code to automate those analyses. Any advice on how to proceed? I am having the most trouble with calling out each variable (i.e. X1
, X2
, etc.) and I am not sure how to do so. I recently started using R so I do not have a firm understanding of these functions and commands. Any help is greatly appreciated.
解决方案
We can use lapply
to loop over the column 'X1' to 'X5', then create a table
with 'Group' column and o the chisq.test
to return a list
of test results
out <- lapply(mydf[paste0("X", 1:5)], function(x)
chisq.test(table(mydf$Group,
factor(x, levels = 0:1, labels = c("AGREE", "DISAGREE"))),
correct = FALSE))
sapply(out, `[[`, "p.value")
# X1 X2 X3 X4 X5
#0.72875061 0.72888976 0.90732945 0.01525704 0.08243538
推荐阅读
- flutter - Flutter - 未处理的异常:NoSuchMethodError:在 null 上调用了方法“验证”
- liquibase - 在 SQLCl 上执行 lb 帮助
- javascript - 使用 routerLink循环标记“ ”抛出错误?
- c - MPI_Isend 不在 MPICH v3.3.2 上立即发送消息
- php - php preg_replace 字符串,除了里面的 url
- node.js - 仅对授权用户使用私有数据的条件渲染是否安全?
- karate - 空手道:从 JSON 中删除动态元素
- spring-boot - Thymeleaf ${#request.requestURI} 返回一个双精度值
- excel - 带有通配符标志的 COUNTIFS 并不总是有效
- docker - 如何修改我的 DOCKERFILE 以将 wget 安装到 kubernetes pod 中?