首页 > 解决方案 > 每行 Shapiro-Wilk 测试

问题描述

我正在尝试确定数据框行中值的正态性。理想情况下,我想计算每行 Shapiro-Wilk 测试(与数据框中的行一样多的测试)。

真实的数据集很大,但为此我使用了一个示例。

dput(example)
structure(c(103L, 122L, 40L, 107L, 124L, 108L, 89L, 102L, 40L, 
70L, 78L, 78L, 78L, 78L, 64L, 64L, 64L, 50L, 50L, 50L, 133L, 
64L, 55L, 64L, 108L, 124L, 108L, 146L, 13L, 40L, 122L, 124L, 
107L, 122L, 133L, 122L, 107L, 121L, 70L, 113L, NA, 108L, NA, 
40L, 122L, 89L, 36L, 113L, 26L, 26L, NA, 103L, NA, 55L, 153L, 
146L, 36L, NA, NA, 77L, NA, 133L, NA, 36L, 167L, 92L, 65L, NA, 
NA, 40L, NA, 107L, NA, 89L, 146L, NA, 92L, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA), .Dim = 10:9, .Dimnames = list(
    c("7", "10", "51", "62", "4", "5", "79", "16", "17", "243"
    ), c("centroid", "n_1", "n_2", "n_3", "n_4", "n_5", "n_6", 
    "n_7", "n_8")))

如前所述,我想测试每一行的正态性,我预测一些行将“通过”,而对于其他行,正态性将不会被计算,因为没有足够的值或者它们都是相同的。我实际上对这些很感兴趣,因为我试图证明这是一个坏主意。我希望将我的结果写入新列,如果无法计算正态性检验,则会出现错误消息(错误/错误)

在此处输入图像描述

我可以为这样的任何行计算夏皮罗:

shapiro.test(example[1,])
    Shapiro-Wilk normality test

data:  example[1, ]
W = 0.9631, p-value = 0.7984

而且我应该能够像这样计算每行夏皮罗(不工作):

> apply(example, example[1:10,], shapiro.test) 
Error in d[-MARGIN] : only 0's may be mixed with negative subscripts

我希望有人能指出我正确的方向。谢谢!

标签: rstatisticsnormal-distribution

解决方案


您可以编写一个函数来获得您想要的结果:

df <- structure(c(103L, 122L, 40L, 107L, 124L, 108L, 89L, 102L, 40L, 
                  70L, 78L, 78L, 78L, 78L, 64L, 64L, 64L, 50L, 50L, 50L, 133L, 
                  64L, 55L, 64L, 108L, 124L, 108L, 146L, 13L, 40L, 122L, 124L, 
                  107L, 122L, 133L, 122L, 107L, 121L, 70L, 113L, NA, 108L, NA, 
                  40L, 122L, 89L, 36L, 113L, 26L, 26L, NA, 103L, NA, 55L, 153L, 
                  146L, 36L, NA, NA, 77L, NA, 133L, NA, 36L, 167L, 92L, 65L, NA, 
                  NA, 40L, NA, 107L, NA, 89L, 146L, NA, 92L, NA, NA, NA, NA, NA, 
                  NA, NA, NA, NA, NA, NA, NA, NA), .Dim = 10:9, .Dimnames = list(
                    c("7", "10", "51", "62", "4", "5", "79", "16", "17", "243"
                    ), c("centroid", "n_1", "n_2", "n_3", "n_4", "n_5", "n_6", 
                         "n_7", "n_8")))

f.shapiro.stat <- function(x, n_diff_numbers = 3) {
  res <- ifelse(sum(!is.na(unique(x))) < n_diff_numbers, 'ERROR', shapiro.test(x)$statistic)
  return(res)
}

res <- apply(df, 1, f.shapiro.stat, n_diff_numbers = 3)

df2 <- as.data.frame(df)
df2$shapiro <- res
df2
> df2
    centroid n_1 n_2 n_3 n_4 n_5 n_6 n_7 n_8   shapiro
7        103  78 133 122  NA  NA  NA  NA  NA 0.9630974
10       122  78  64 124 108 103 133 107  NA 0.9225951
51        40  78  55 107  NA  NA  NA  NA  NA 0.9723459
62       107  78  64 122  40  55  36  89  NA 0.9552869
4        124  64 108 133 122 153 167 146  NA 0.9385053
5        108  64 124 122  89 146  92  NA  NA 0.9809580
79        89  64 108 107  36  36  65  92  NA 0.8915689
16       102  50 146 121 113  NA  NA  NA  NA 0.9307804
17        40  50  13  70  26  NA  NA  NA  NA 0.9911093
243       70  50  40 113  26  77  40  NA  NA 0.9238762

该函数还检查您的数据是否有足够的变化。例子:

> f.shapiro.stat(x = rep(1,1,1))
[1] "ERROR"

推荐阅读