首页 > 解决方案 > 由于字符变量,在 R 中运行 T 测试时出现错误消息

问题描述

我一直在尝试在 R 中运行两侧 t 检验,但一直遇到错误。下面是我的流程、数据集详细信息和来自 R-studio 的脚本。我使用了从以下网站下载的名为 LungCapacity 的数据集:https ://www.statslectures.com/r-scripts-datasets 。

#Imported data set into RStudio.

# Ran a summary report to see the data and class.
summary(LungCapData)

# Here I could see that the smoke column is a character, so I converted it to a factor
LungCapacityData$Smoke <- factor(LungCapacityData$Smoke)

# On checking the summary. I see its converted to a factor with a yes and no.

# I want to run a t-test between lung capacity and smoking. 
t.test(LungCapData$LungCap, LungCapData$Smoke, alternative = c("two.sided"), mu=0, var.equal = FALSE, conf.level = 0.95, paired = FALSE)

现在运行这个我得到以下错误。

Error in var(y) : Calling var(x) on a factor x is defunct.
  Use something like 'all(duplicated(x)[-1L])' to test for a constant vector.
In addition: Warning message:
In mean.default(y) : argument is not numeric or logical: returning NA

我试图将烟雾变量从 Yes 和 No 转换为 1 和 0。数据运行但不正确。我究竟做错了什么?

标签: rt-test

解决方案


您非常接近,您只需要t.test使用公式调用:

LungCapacityData <- read.table(
  "https://docs.google.com/uc?id=0BxQfpNgXuWoITmVwQzJ2VF9qVlU&export=download",
  header = TRUE)

t.test(LungCap ~ Smoke, data = LungCapacityData,
       alternative = c("two.sided"), mu=0, var.equal = FALSE,
       conf.level = 0.95, paired = FALSE)

#   Welch Two Sample t-test
#
#data:  LungCap by Smoke
#t = -3.6498, df = 117.72, p-value = 0.0003927
#alternative hypothesis: true difference in means is not equal to 0
#95 percent confidence interval:
# -1.3501778 -0.4003548
#sample estimates:
# mean in group no mean in group yes 
#         7.770188          8.645455 

使用您当前的方法,您正在尝试比较LungCapacityData$LungCap哪个是数字向量:

LungCapacityData$LungCap[1:10]
# [1]  6.475 10.125  9.550 11.125  4.800  6.225  4.950  7.325  8.875  6.800

LungCapacityData$Smoke它是因子的向量:

LungCapacityData$Smoke[1:10]
# [1] no  yes no  no  no  no  no  no  no  no 

相反,您想指示在按 分组时t.test进行比较。这是通过公式实现的。LungCapacityData$LungCapLungCapacityData$Smoke

公式LungCap ~ SmokeLungCap应该取决于Smoke。使用公式时,还需要提供data =.

当您尝试转换LungCapacityData$Smoke为数字时,您会得到错误的结果,因为您得到的只是没有生物学意义的因子水平指数。

as.numeric(LungCapacityData$Smoke)[1:10]
# [1] 1 2 1 1 1 1 1 1 1 1

您基本上是在询问我们分配的因子水平的平均值是否与肺活量的平均值不同。

另一种方法是对LungCapacityData$LungCap自己进行子集化,但这需要更多的输入:

t.test(LungCapacityData$LungCap[LungCapacityData$Smoke == "yes"],
       LungCapacityData$LungCap[LungCapacityData$Smoke == "no"],
       alternative = c("two.sided"), mu=0, var.equal = FALSE,
       conf.level = 0.95, paired = FALSE)

推荐阅读