首页 > 解决方案 > Automate statistical analyses using apply/apply-like function

问题描述

I have a dataframe that contains 5 columns, each corresponding to a survey item, and a grouping variable. There are a total of 300 observations in my dataframe and each cell entry represents the response given by a student on a given item. I constructed the following reproducible dataframe:

set.seed(14)
Group <- rep(c(0, 1), each = 150)
mydf <- data.frame(replicate(5, sample(0:1,300,rep=TRUE)))
mydf$Group <- Group
mydf$Group <- factor(mydf$Group, levels = c(0, 1), labels = c("Group A", "Group B"))
head(mydf); tail(mydf)
> head(mydf); tail(mydf)
  X1 X2 X3 X4 X5   Group
1  0  1  1  1  0 Group A
2  1  1  1  1  1 Group A
3  1  1  0  1  0 Group A
4  1  0  0  1  1 Group A
5  1  1  0  1  0 Group A
6  1  0  1  1  1 Group A
    X1 X2 X3 X4 X5   Group
295  0  1  1  0  1 Group B
296  0  0  1  0  0 Group B
297  1  1  0  1  0 Group B
298  1  1  0  0  1 Group B
299  0  0  1  0  0 Group B
300  1  1  1  1  1 Group B

What I would like to do is perform chi-square test of independence on each survey item X1 to X5. So far I have been doing the following [for item 1 (X1)]:

mydf$X1 <- factor(mydf$X1, levels = c(0, 1), labels = c("AGREE", "DISAGREE"))
MyTable <- table(mydf$Group, mydf$X1)
addmargins(MyTable)
chisq.test(MyTable, correct = FALSE)

and I would like to use the lapply function (or something similar) to automate this process so that I do not have to repeat the preceding code for each of the 5 items. This is particualry important because I have a similar dataframe that contains 50 items and I would like to use the same code to automate those analyses. Any advice on how to proceed? I am having the most trouble with calling out each variable (i.e. X1, X2, etc.) and I am not sure how to do so. I recently started using R so I do not have a firm understanding of these functions and commands. Any help is greatly appreciated.

标签: rstatisticsapplylapply

解决方案


We can use lapply to loop over the column 'X1' to 'X5', then create a table with 'Group' column and o the chisq.test to return a list of test results

out <- lapply(mydf[paste0("X", 1:5)], function(x) 
    chisq.test(table(mydf$Group,
            factor(x, levels = 0:1, labels = c("AGREE", "DISAGREE"))), 
           correct = FALSE))

sapply(out, `[[`, "p.value")
#       X1         X2         X3         X4         X5 
#0.72875061 0.72888976 0.90732945 0.01525704 0.08243538 

推荐阅读