首页 > 解决方案 > 不平衡单向方差分析

问题描述

我已经为单向方差分析编写了一个脚本,但是我的样本对于我的因子组来说是不相等的,我不知道如何将它合并到代码中。基本上,我有一个因子是“管类型”(A、B、C)和一个响应变量,它是 1 或 2。我没有相同数量的数据点,因为一些数据丢失了。

任何人都可以帮忙吗?这是迄今为止有效的代码,并为我提供了测试统计信息:

#First, clear R's memory of everything using this code
rm(list=ls())

#Next up, load some packages I might use

library("dplyr")
library ("ggpubr")
library("devtools")


#this bit of code tells R what directory to look in when I specify a file
setwd("C:/Users/danie/OneDrive/Documents/RPractice")
#so here, it knows exactly where to find the file - in the working direc
#And I'm telling R that "my_data" refers to that specific dataset
my_data = read.csv("outcomes.csv")


#this just shows me 10 random rows of data
set.seed(1234)
dplyr::sample_n(my_data, 10)

#I think I need to convert tube to a factor with three levels
#But first, check the structure of the data

str(my_data)

#actually might be okay, not sure
#let's try checking levels
levels(my_data$ï..Tube)

#Alright maybe not, let's try converting to a factor


my_data$ï..Tube <- as.factor(my_data$ï..Tube) 
#Now if I define my data columns as objects to manipulate I can see 
#what R thinks they are
Tube = my_data$ï..Tube
Test.Outcome = my_data$Test.Outcome

#And check the class too I guess

class(my_data$ï..Tube) 

#If I check "head" it'll tell me if my factor is on as an NA which can sometimes
#happen if the code isn't right, and then the test won't work
head(my_data)
#Looks good though, R sees it as a factor with 3 levels

#Let's make a frequency table, see what that looks like

table(my_data$ï..Tube, my_data$Test.Outcome)
#This confirms again that R is handling my factors correctly

# Compute the analysis of variance
res.aov <- aov(Test.Outcome ~ ï..Tube, data = my_data)
# Summary of the analysis
summary(res.aov)

#So from my outputs, there is no significant difference across tube types
#But I still need to think about if there's a way to make it an unbalanced
#analysis because I have unequal sample sizes from lost data

标签: anova

解决方案


推荐阅读