r - 如何绘制不同的测试结论
问题描述
我正在尝试绘制我为我的数据库所做的关于不同组之间连接的不同测试
这是数据框结构
Month District Age Gender Education Disability Religion Occupation JobSeekers GMI
1 2020-01 Dan U17 Male None None Jewish Unprofessional workers 2 0
2 2020-01 Dan U17 Male None None Muslims Sales and costumer service 1 0
3 2020-01 Dan U17 Female None None Other Undefined 1 0
4 2020-01 Dan 18-24 Male None None Jewish Production and construction 1 0
5 2020-01 Dan 18-24 Male None None Jewish Academic degree 1 0
6 2020-01 Dan 18-24 Male None None Jewish Practical engineers and technicians 1 0
ACU NACU NewSeekers NewFiredSeekers
1 0 2 0 0
2 0 1 0 0
3 0 1 0 0
4 0 1 0 0
5 0 1 0 0
6 0 1 1 1
我根据相关测试减少了它,例如我所做的 t 检验:
dist.newseek <- Cdata %>%
group_by(Month,District) %>%
summarise(NewSeekers=sum(NewSeekers))
Month District NewSeekers
<chr> <chr> <int>
1 2020-01 Dan 6551
2 2020-01 Jerusalem 3589
3 2020-01 North 6154
4 2020-01 Sharon 4131
5 2020-01 South 4469
6 2020-02 Dan 5529
然后进行 t 检验
t.test(NewSeekers ~ District,data=subset(dist.newseek,District %in% c("Dan","South")))
这是我为每个组所做的所有测试(新求职者与地区的 t 测试,年龄与新求职者的 wilcox 和职业与新求职者的 ANONA)我正在寻找一种图形方式来显示每个测试的结果。如果您有任何想法,请帮助
# t test for district vs new seekers
# sorting
dist.newseek <- Cdata %>%
group_by(Month,District) %>%
summarise(NewSeekers=sum(NewSeekers))
# performing a t test on the mini table we created
t.test(NewSeekers ~ District,data=subset(dist.newseek,District %in% c("Dan","South")))
# results
Welch Two Sample t-test
data: NewSeekers by District
t = 0.68883, df = 4.1617, p-value = 0.5274
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-119952.3 200737.3
sample estimates:
mean in group Dan mean in group South
74608.25 34215.75
#wilcoxon test
# filtering Cdata to New seekers based on month and age
age.newseek <- Cdata %>%
group_by(Month,Age) %>%
summarise(NewSeekers=sum(NewSeekers))
#performing a wilcoxon test on the subset
wilcox.test(NewSeekers ~ Age,data=subset(age.newseek,Age %in% c("25-34","45-54")))
# Results
Wilcoxon rank sum exact test
data: NewSeekers by Age
W = 11, p-value = 0.4857
alternative hypothesis: true location shift is not equal to 0
方差分析测试
# Sorting occupation and month by new seekers
occu.newseek <- Cdata %>%
group_by(Month,Occupation) %>%
summarise(NewSeekers=sum(NewSeekers))
## Make the Occupation as a factor
occu.newseek$District <- as.factor(occu.newseek$Occupation)
## Get the occupation group means and standart deviations
group.mean.sd <- aggregate(
x = occu.newseek$NewSeekers, # Specify data column
by = list(occu.newseek$Occupation), # Specify group indicator
FUN = function(x) c('mean'=mean(x),'sd'= sd(x))
)
## Run one way ANOVA test
anova_one_way <- aov(NewSeekers~ Occupation, data = occu.newseek)
summary(anova_one_way)
## Run the Tukey Test to compare the groups
TukeyHSD(anova_one_way)
## Check the mean differences across the groups
library(ggplot2)
ggplot(occu.newseek, aes(x = Occupation, y = NewSeekers, fill = Occupation)) +
geom_boxplot() +
geom_jitter(shape = 15,
color = "steelblue",
position = position_jitter(0.21)) +
theme_classic()
谢谢,摩西
解决方案
推荐阅读
- unit-testing - 如何对 registry.gitlab.com 中的镜像进行单元测试和部署到自托管 microk8s
- java - Firebase 管理员 - Gradle - Fat Jar - 无法正常工作
- python - 合理的项目结构
- android - 传递数据总是返回 0
- c# - 如何在 Rx.NET 中组合两个不同的 GroupedStreams?
- sql-server - IN() 中子查询的 csv 导致转换错误
- javascript - HTML CSS - 单击时使图标更改颜色(如链接)
- arrays - 在 React 中构建可链接的过滤器组件?
- android - 上下文发送广播在真实设备中不起作用
- java - 与 Netty 客户端重复使用相同的本地端口号