首页 > 解决方案 > ggplot 2中等效的箱线图+建议异常值检测和正态性

问题描述

我有以下示例:

ex <- structure(list(Q1 = c(2, 6, 2, 2, 2, 6, 1, 
6, 7, 7, 6, 5, 6, 2, 5, 4, 4, 2, 7, 7, 5, 3, 5, 6, 1, 5, 4, 6, 
6, 5, 3, 3, 5, 2, 5, 1, 4, 4, 6, 2, 5, 7, 5, 2, 7, 5, 7, 3, 4, 
7, 5, 6, 2, 7, 3, 2, 4, 5, 6, 6), Q2 = c(2, 
4, 6, 5, 6, 2, 6, 3, 7, 6, 5, 5, 5, 4, 5, 3, 6, 5, 1, 6, 5, 4, 
4, 4, 2, 5, 5, 4, 6, 5, 5, 4, 2, 4, 4, 1, 5, 3, 7, 3, 5, 7, 4, 
2, 6, 4, 6, 4, 4, 6, 5, 2, 1, 5, 2, 2, 5, 3, 2, 4), Q3 = c(6, 
7, 6, 6, 6, 7, 6, 6, 7, 6, 6, 6, 5, 7, 7, 7, 7, 6, 6, 6, 6, 5, 
5, 7, 7, 6, 6, 6, 6, 3, 7, 5, 3, 7, 5, 6, 6, 4, 7, 7, 4, 7, 5, 
7, 7, 7, 7, 7, 6, 4, 7, 7, 6, 7, 3, 7, 6, 5, 7, 6), Q4 = c(1, 
6, 5, 5, 2, 2, 2, 1, 6, 5, 5, 2, 1, 2, 2, 2, 2, 2, 1, 4, 2, 2, 
3, 2, 1, 3, 2, 2, 6, 4, 7, 2, 2, 2, 3, 1, 4, 5, 5, 2, 5, 6, 3, 
2, 4, 1, 1, 3, 6, 6, 2, 1, 1, 2, 4, 2, 1, 3, 2, 4), Q5 = c(2, 
6, 6, 6, 6, 6, 6, 5, 7, 2, 3, 2, 6, 7, 4, 6, 6, 2, 1, 5, 6, 5, 
4, 6, 5, 6, 5, 5, 6, 4, 7, 2, 6, 5, 4, 1, 6, 4, 6, 5, 4, 6, 5, 
4, 5, 5, 7, 5, 6, 5, 3, 7, 5, 7, 3, 4, 5, 3, 6, 6), Q6 = c(6, 
7, 6, 7, 6, 6, 6, 5, 7, 6, 5, 4, 6, 7, 6, 7, 7, 5, 3, 5, 6, 5, 
2, 7, 5, 5, 7, 6, 6, 4, 7, 3, 5, 7, 4, 6, 5, 5, 6, 6, 5, 7, 6, 
2, 6, 7, 7, 7, 7, 6, 6, 7, 6, 7, 3, 7, 6, 3, 7, 7)), row.names = c(NA, 
-60L), class = "data.frame")

我试过了

boxplot(ex)

它给了我一个非常粗略的箱线图。

我的问题是:

  1. 获取图形/改进版本的 ggplot 等效项是什么(或者同样的,我只希望 ggplot 代码生成它)。

  2. 是否有一个 R 函数可以以图形和表格的形式为我提供关于正态性和异常值的优雅报告(NA 也会很好,但我在发送的样本中没有)

标签: rggplot2

解决方案


问题一:ggplot箱线图

首先将您的数据带入长格式,pivot_longer然后使用geom_boxplot()

ex_long <- ex %>% 
  pivot_longer(
    cols = everything(),
    names_to = "names",
    values_to = "values"
  )

ggplot(ex_long, aes(x = names, y=values)) +
  geom_boxplot()

在此处输入图像描述

问题 2a:正常性的视觉和表格检查:

library(tidyverse)
library(ggpubr)
library(rstatix)

library("ggpubr")
# Visual check for normality
# Density plot
ggdensity(ex_long$values, fill = "blue")
# QQ plot
ggqqplot(ex_long$values)

# tabular p values for shapiro test
ex_long %>%
  group_by(names) %>%
  shapiro_test(values)

问题 2b 异常值:

# set limits of outliers with the percentile method
lower_bound <- quantile(ex_long$values, 0.025)
lower_bound
upper_bound <- quantile(ex_long$values, 0.975)
upper_bound

# Identify all outlieres
outlier_ind <- which(ex_long$values < lower_bound | ex_long$values > upper_bound)
outlier_ind

# print table
ex[outlier_ind, "values"]


# test for outliers with the grubbs test
# install.packages("outliers")
library(outliers)
test <- grubbs.test(ex_long$values)
test

# visualise outliers with `mtext` in ggplot

ggplot(ex_long, aes(x = names, y=values)) +
  geom_boxplot()+
  mtext(paste("Outliers: ", paste(out, collapse = ", ")))

结论 您的数据不是正态分布的,并且您没有任何异常值!


推荐阅读