r - 多个T检验的功能,以找到主效应
问题描述
df <- data.frame (rating1 = c(1,5,2,4,5),
rating2 = c(2,1,2,4,2),
rating3 = c(0,2,1,2,0),
race = c("black", "asian", "white","black","white"),
gender = c("male","female","female","male","female")
)
我想对组均值(例如 rating1 中亚洲人的均值)和每个评级的总体均值(例如 rating1)进行 t 检验。以下是我在 rating1 中的亚洲人代码。
asian_df <- df %>%
filter(race == "asian")
t.test(asian_df$rating1, mean(df$rating1))
然后对于评分为 2 的黑人,我会跑
black_df <- df %>%
filter(race == "black")
t.test(black_df$rating2, mean(df$rating2))
如何编写一个函数来自动执行每个组的 t 检验?到目前为止,我必须手动更改变量名称,以便基本上为每个种族、每个性别和每个等级(等级 1 到等级 3)运行。谢谢!
解决方案
执行多个 t 检验会增加 I 类错误的风险,并且您需要针对多重比较进行调整,以使您的结果有效/有意义。您可以通过循环变量来运行 t 检验,例如
library(tidyverse)
df <- data.frame (rating1 = c(5,8,7,8,9,6,9,7,8,5,8,5),
rating2 = c(2,7,8,4,9,3,6,1,7,3,9,1),
rating3 = c(0,6,1,2,7,2,9,1,6,2,3,1),
race = c("asian", "asian", "asian","black","asian","black","white","black","white","black","white","black"),
gender = c("male","female","female","male","female","male","female","male","female","male","female","male")
)
for (rac in unique(df$race)){
tmp_df <- df %>%
filter(race == rac)
print(rac)
print(t.test(tmp_df$rating1,
rep(mean(df$rating1),
length(tmp_df$rating1))))
}
[1] "asian"
Welch Two Sample t-test
data: tmp_df$rating1 and rep(mean(df$rating1), length(tmp_df$rating1))
t = 0.19518, df = 3, p-value = 0.8577
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-2.550864 2.884198
sample estimates:
mean of x mean of y
7.250000 7.083333
[1] "black"
Welch Two Sample t-test
data: tmp_df$rating1 and rep(mean(df$rating1), length(tmp_df$rating1))
t = -1.5149, df = 4, p-value = 0.2044
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-2.5022651 0.7355985
sample estimates:
mean of x mean of y
6.200000 7.083333
[1] "white"
Welch Two Sample t-test
data: tmp_df$rating1 and rep(mean(df$rating1), length(tmp_df$rating1))
t = 3.75, df = 2, p-value = 0.06433
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.1842176 2.6842176
sample estimates:
mean of x mean of y
8.333333 7.083333
for (gend in unique(df$gender)){
tmp_df <- df %>%
filter(gender == gend)
print(gend)
print(t.test(tmp_df$rating1,
rep(mean(df$rating1),
length(tmp_df$rating1))))
}
[1] "male"
Welch Two Sample t-test
data: tmp_df$rating1 and rep(mean(df$rating1), length(tmp_df$rating1))
t = -2.0979, df = 5, p-value = 0.09
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-2.4107761 0.2441094
sample estimates:
mean of x mean of y
6.000000 7.083333
[1] "female"
Welch Two Sample t-test
data: tmp_df$rating1 and rep(mean(df$rating1), length(tmp_df$rating1))
t = 3.5251, df = 5, p-value = 0.01683
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
0.2933469 1.8733198
sample estimates:
mean of x mean of y
8.166667 7.083333
由于多次测试(在本例中为 5 次 t 检验),您误报的机会1 - (1 - 0.05)^5 = 22.62%
<- 非常高。为了解决这个问题,您可以应用Bonferroni 校正,它基本上采用您所需的 p 值(在本例中,p < 0.05)并将其除以测试次数(即拒绝空值所需的新 p 值是p < 0.01)。当您应用此校正时,即使是“最佳”t 检验结果(性别;p 值 = 0.01683)也没有统计学意义。
另一种方法是使用 ANOVA 比较所有条件下的均值,然后使用 Tukey 的 HSD 来确定哪些组不同。Tukey 的 HSD 是单一的事后测试,因此您无需考虑多次测试,并且您的结果是有效的。使这种方法适应您的问题可能是一种更好的方法,例如
anova_one_way <- aov(rating1 + rating2 + rating3 ~ race + gender, data = df)
summary(anova_one_way)
Df Sum Sq Mean Sq F value Pr(>F)
race 2 266.70 133.35 14.01 0.00243 **
gender 1 140.08 140.08 14.72 0.00497 **
Residuals 8 76.13 9.52
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
TukeyHSD(anova_one_way)
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = rating1 + rating2 + rating3 ~ race + gender, data = df)
$race
diff lwr upr p adj
black-asian -7.050000 -12.963253 -1.136747 0.0224905
white-asian 4.416667 -2.315868 11.149201 0.2076254
white-black 11.466667 5.029132 17.904201 0.0023910
$gender
diff lwr upr p adj
male-female -3.416667 -7.523829 0.6904958 0.0913521
推荐阅读
- javascript - xml getElementsByTagName 返回 null
- c++ - 在 MFC 中嵌入 Windows Media Player
- apify - 如何在 apify 中使用代理和基本爬虫
- python - 如何检查同一数据框列中的重复值并通过根据频率删除行来应用 if 条件?
- android - 在 Nest Hub 上显示数据:使用 Cast SDK 还是 Assistant SDK?
- discord.py - 打印错误代码不起作用(PyCharm,Python 3.8)
- java - BLE Android API - onCharacteristicChanged() 上的数据包丢失和数据频率下降并带有通知
- aws-cli - AWS CLI 不支持 MultiFactorAuthAge
- reactjs - 尝试设置输入属性时反应useRef返回未定义的错误
- javascript - Javascript - 将一些文本动态复制到剪贴板