首页 > 解决方案 > 如何绘制不同的测试结论

问题描述

我正在尝试绘制我为我的数据库所做的关于不同组之间连接的不同测试

这是数据框结构

    Month District   Age Gender Education Disability Religion                          Occupation JobSeekers GMI
1 2020-01      Dan   U17   Male      None       None   Jewish              Unprofessional workers          2   0
2 2020-01      Dan   U17   Male      None       None  Muslims          Sales and costumer service          1   0
3 2020-01      Dan   U17 Female      None       None    Other                           Undefined          1   0
4 2020-01      Dan 18-24   Male      None       None   Jewish         Production and construction          1   0
5 2020-01      Dan 18-24   Male      None       None   Jewish                     Academic degree          1   0
6 2020-01      Dan 18-24   Male      None       None   Jewish Practical engineers and technicians          1   0
  ACU NACU NewSeekers NewFiredSeekers
1   0    2          0               0
2   0    1          0               0
3   0    1          0               0
4   0    1          0               0
5   0    1          0               0
6   0    1          1               1

我根据相关测试减少了它,例如我所做的 t 检验:

dist.newseek <- Cdata %>% 
  group_by(Month,District) %>% 
  summarise(NewSeekers=sum(NewSeekers))

  Month   District  NewSeekers
  <chr>   <chr>          <int>
1 2020-01 Dan             6551
2 2020-01 Jerusalem       3589
3 2020-01 North           6154
4 2020-01 Sharon          4131
5 2020-01 South           4469
6 2020-02 Dan             5529

然后进行 t 检验

t.test(NewSeekers ~ District,data=subset(dist.newseek,District %in% c("Dan","South")))

这是我为每个组所做的所有测试(新求职者与地区的 t 测试,年龄与新求职者的 wilcox 和职业与新求职者的 ANONA)我正在寻找一种图形方式来显示每个测试的结果。如果您有任何想法,请帮助

# t test for district vs new seekers

# sorting

dist.newseek <- Cdata %>% 
  group_by(Month,District) %>% 
  summarise(NewSeekers=sum(NewSeekers))

# performing a t test on the mini table we created

t.test(NewSeekers ~ District,data=subset(dist.newseek,District %in% c("Dan","South")))

# results

Welch Two Sample t-test

data:  NewSeekers by District
t = 0.68883, df = 4.1617, p-value = 0.5274
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
  -119952.3  200737.3
sample estimates:
  mean in group Dan mean in group South 
74608.25            34215.75 

#wilcoxon test 

# filtering Cdata to New seekers based on month and age

age.newseek <- Cdata %>% 
  group_by(Month,Age) %>% 
  summarise(NewSeekers=sum(NewSeekers))

#performing a wilcoxon test on the subset 

wilcox.test(NewSeekers ~ Age,data=subset(age.newseek,Age %in% c("25-34","45-54")))

# Results

Wilcoxon rank sum exact test

data:  NewSeekers by Age
W = 11, p-value = 0.4857
alternative hypothesis: true location shift is not equal to 0

方差分析测试

# Sorting occupation and month by new seekers

occu.newseek <- Cdata %>% 
  group_by(Month,Occupation) %>% 
  summarise(NewSeekers=sum(NewSeekers))

## Make the Occupation as a factor

occu.newseek$District <- as.factor(occu.newseek$Occupation)

## Get the occupation group means and standart deviations

group.mean.sd <- aggregate(
  x = occu.newseek$NewSeekers, # Specify data column
  by = list(occu.newseek$Occupation), # Specify group indicator
  FUN = function(x) c('mean'=mean(x),'sd'= sd(x))
)

## Run one way ANOVA test
anova_one_way <- aov(NewSeekers~ Occupation, data = occu.newseek)
summary(anova_one_way)

## Run the Tukey Test to compare the groups 
TukeyHSD(anova_one_way)

## Check the mean differences across the groups 

library(ggplot2)
ggplot(occu.newseek, aes(x = Occupation, y = NewSeekers, fill = Occupation)) +
  geom_boxplot() +
  geom_jitter(shape = 15,
              color = "steelblue",
              position = position_jitter(0.21)) +
  theme_classic()

阴谋

谢谢,摩西

标签: rplotstatisticsanova

解决方案


推荐阅读