首页 > 解决方案 > 我正在尝试使用 ggplot2 将正态曲线拟合到我的直方图上

问题描述

我想为我的分布拟合一条正态曲线,我已经看到了一些例子,但我不断出错。

以下是我正在使用的一些数据。我为长度道歉,因为出于保密原因我不得不更改变量名。

structure(list(X = c(29L, 22L, 27L, 26L, 25L, 26L, 16L, 30L, 
31L, 32L, 29L, 19L, 18L, 26L, 25L, 22L, 23L, 27L, 21L, 16L, 18L, 
25L, 21L, 23L, 22L, 25L, 29L, 23L, 20L, 25L, 25L, 21L, 30L, 27L, 
25L, 18L, 27L, 25L, 27L, 28L, 26L, 20L, 20L, 20L, 23L, 33L, 27L, 
17L, 21L, 19L, 26L, 26L, 20L, 25L, 30L, 17L, 31L, 26L, 25L, 20L, 
27L, 21L, 21L, 21L, 26L, 30L, 23L, 22L, 28L, 17L, 22L, 16L, 25L, 
19L, 14L, 19L, 29L, 27L, 21L, 31L, 24L, 20L, 14L, 23L, 21L, 26L, 
29L, 24L, 27L, 17L, 21L, 19L, 21L, 22L, 22L, 26L, 26L, 34L, 28L, 
34L, 26L, 23L, 24L, 25L, 21L, 19L, 18L, 19L, 20L, 22L, 21L, 20L, 
22L, 19L, 22L, 27L, 25L, 20L, 23L, 19L, 32L, 25L, 27L, 23L, 30L, 
31L, 31L, 23L, 25L, 21L, 26L, 17L, 24L, 16L, 29L, 20L, 31L, 28L, 
28L, 26L, 26L, 29L, 33L, 23L, 19L, 24L, 23L, 20L, 20L, 28L, 19L, 
26L, 25L, 24L, 19L, 21L, 22L, 21L, 31L, 21L, 16L, 23L, 29L, 25L, 
24L, 19L, 19L, 19L, 23L, 25L, 26L, 19L, 22L, 24L, 29L, 19L, 15L, 
22L, 17L, 23L, 27L, 23L, 16L, 23L, 28L, 21L, 30L, 19L, 24L, 23L, 
24L, 31L, 23L, 28L, 21L, 25L, 29L, 22L, 28L, 20L, 20L, 28L, 29L, 
27L, 27L, 22L, 22L, 29L, 31L, 22L, 24L, 15L, 20L, 34L, 23L, 24L, 
21L, 25L, 24L, 20L, 26L, 24L, 16L, 25L, 27L, 28L, 26L, 24L, 22L, 
21L, 27L, 25L, 24L, 26L, 16L, 29L, 18L, 26L, 23L, 26L, 27L, 16L, 
33L, 23L, 31L, 23L, 21L, 22L, 22L, 20L, 19L, 24L, 25L, 28L, 24L, 
26L, 30L, 26L, 29L, 17L, 29L, 19L, 28L, 25L, 24L, 23L, 25L, 19L, 
25L, 24L, 23L, 20L, 18L, 20L, 21L, 20L, 24L, 32L, 19L, 19L, 22L, 
21L, 22L, 22L, 20L, 25L, 17L, 28L, 25L, 22L, 19L, 24L, 15L, 26L, 
26L, 30L, 29L, 20L, 26L, 25L, 27L, 24L, 26L, 21L, 23L, 22L, 13L, 
21L, 22L, 25L, 23L, 23L, 15L, 20L, 29L, 26L, 23L, 23L, 20L, 23L, 
21L, 30L, 16L, 21L, 19L, 20L, 26L, 30L, 20L, 20L, 23L, 22L, 24L, 
19L, 21L, 24L, 19L, 26L, 32L, 20L, 19L, 24L, 20L, 29L, 21L, 20L, 
26L, 22L, 22L, 23L, 27L, 24L, 24L, 25L, 21L, 30L, 21L, 23L, 27L, 
21L, 27L, 23L, 24L, 22L, 20L, 18L, 30L, 20L, 23L, 21L, 24L, 28L, 
22L, 17L, 21L, 26L, 22L, 24L, 25L, 27L, 24L, 21L, 19L, 24L, 18L, 
29L, 21L, 23L, 19L, 16L, 21L, 24L, 19L, 24L, 26L, 27L, 22L, 17L, 
16L, 25L, 21L, 19L, 27L, 33L, 24L, 26L, 26L, 27L, 23L, 24L, 24L, 
24L, 20L, 23L, 21L, 19L, 23L, 32L, 17L, 16L, 16L, 25L, 23L, 21L, 
22L, 25L, 19L, 23L, 24L, 18L, 26L, 24L, 21L, 20L, 27L, 23L, 22L, 
28L, 20L, 21L, 20L, 22L, 19L, 27L, 22L, 21L, 24L, 18L, 24L, 21L, 
17L, 22L, 24L, 18L, 19L, 21L, 27L, 28L, 23L, 17L, 28L, 20L, 23L, 
22L, 21L, 20L, 30L, 30L, 23L, 24L, 25L, 23L, 24L, 29L, 17L, 22L, 
28L, 14L, 23L, 21L, 23L, 21L, 20L, 25L, 26L, 24L, 23L, 22L, 21L, 
26L, 30L, 19L, 22L, 22L, 19L, 19L, 26L, 24L, 22L, 20L, 22L, 27L, 
19L, 27L, 18L, 20L, 19L, 22L, 30L, 14L, 23L, 27L, 23L, 16L, 20L, 
20L, 20L, 25L, 19L, 21L, 21L, 23L, 18L, 24L, 22L, 26L, 22L, 17L, 
21L, 21L, 22L, 19L, 21L, 27L, 23L, 20L, 28L, 26L, 26L, 24L, 20L, 
30L, 27L, 21L, 25L, 20L, 25L, 25L, 24L, 19L, 25L, 25L, 19L, 22L, 
26L, 16L, 28L, 21L, 23L, 25L, 26L, 14L, 24L, 25L, 19L, 26L, 27L, 
19L, 20L, 23L, 23L, 28L, 19L, 20L, 23L, 27L, 24L, 25L, 23L, 24L, 
25L, 21L, 28L, 20L, 26L, 29L, 24L, 18L, 20L, 22L, 32L, 35L, 25L, 
21L, 24L, 13L, 17L, 21L, 28L, 25L, 19L, 22L, 27L, 28L, 26L, 19L, 
27L, 20L, 22L, 24L, 24L, 31L, 23L, 29L, 28L, 20L, 19L, 28L, 23L, 
21L, 25L, 21L, 22L, 27L, 25L, 21L, 23L, 25L, 26L, 27L, 26L, 25L, 
29L, 33L, 25L, 21L, 19L, 23L, 19L, 19L, 31L, 21L, 23L, 22L, 28L, 
27L, 21L, 22L, 19L, 25L, 26L, 24L, 15L, 21L, 32L, 27L, 27L, 25L, 
23L, 28L, 23L, 21L, 27L, 16L, 17L, 23L, 29L, 22L, 21L, 30L, 26L, 
20L, 21L, 27L, 19L, 29L, 22L, 26L, 19L, 21L, 28L, 29L, 22L, 17L, 
30L, 26L, 25L, 20L, 20L, 24L, 28L, 25L, 19L, 26L, 20L, 25L, 18L, 
17L, 26L, 27L, 28L, 22L, 18L, 23L, 29L, 26L, 27L, 33L, 20L, 23L, 
20L, 16L, 23L, 30L, 25L, 27L, 26L, 26L, 22L, 26L, 20L, 24L, 22L, 
25L, 23L, 28L, 24L, 21L, 22L, 27L, 24L, 27L, 21L, 30L, 33L, 13L, 
26L, 20L, 24L, 20L, 22L, 21L, 21L, 32L, 19L, 31L, 28L, 21L, 26L, 
19L, 23L, 22L, 23L, 22L, 21L, 24L, 16L, 25L, 20L, 27L, 21L, 24L, 
24L, 27L, 22L, 25L, 28L, 27L, 28L, 28L, 18L, 16L, 23L, 22L, 24L, 
23L, 23L, 29L, 23L, 18L, 22L, 24L, 27L, 28L, 23L, 22L, 15L, 27L, 
23L, 24L, 17L, 31L, 24L, 17L, 16L, 28L, 27L, 27L, 23L, 23L, 30L, 
21L, 24L, 16L, 25L, 16L, 23L, 27L, 20L, 23L, 19L, 25L, 18L, 22L, 
24L, 19L, 22L, 27L, 22L, 18L, 13L, 19L, 26L, 23L, 25L, 29L, 17L, 
24L, 30L, 18L, 27L, 16L, 22L, 29L, 16L, 19L, 21L, 21L, 22L, 21L, 
17L, 19L, 20L, 31L, 30L, 25L, 25L, 23L, 21L, 26L, 20L, 22L, 20L, 
21L, 25L, 22L, 21L, 24L, 13L, 24L, 24L, 23L, 24L, 23L, 19L, 27L, 
22L, 37L, 22L, 25L, 23L, 27L, 14L, 26L, 21L, 19L, 21L, 22L, 29L, 
26L, 23L, 21L, 20L, 14L, 23L, 26L, 21L, 26L, 17L, 21L, 19L, 23L, 
14L, 25L, 18L, 22L, 28L, 29L, 21L, 27L, 25L, 28L, 24L, 24L, 24L, 
30L, 22L, 24L, 21L, 24L, 16L, 25L, 18L, 20L, 19L, 25L, 17L, 20L, 
21L, 18L, 19L, 26L, 23L, 24L, 20L, 21L, 31L, 27L, 23L, 22L, 16L, 
21L, 23L, 20L, 23L, 29L, 25L, 23L, 24L, 30L, 26L, 27L, 22L, 14L, 
12L, 19L, 23L, 22L, 16L, 15L, 23L, 19L, 24L, 25L, 15L, 21L, 30L, 
13L, 27L, 21L, 17L, 25L, 29L, 22L, 22L, 21L, 31L, 22L, 29L, 30L, 
20L, 21L, 21L, 22L, 26L, 23L, 18L, 15L, 17L, 27L, 20L, 26L, 25L, 
25L, 25L, 27L, 20L, 25L, 27L, 24L, 21L, 25L, 25L, 18L, 31L, 23L, 
26L, 22L, 29L, 20L), row.names = c(NA, 
-1000L), class = c("tbl_df", "tbl", "data.frame"), spec = structure(list(
    cols = list(X = structure(list(), class = c("collector_integer", 
    "collector")), Y = structure(list(), class = c("collector_integer", 
    "collector")), Z = structure(list(), class = c("collector_integer", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector"))), class = "col_spec"))

在这里的第一篇文章,我想我会报废到光秃秃的骨头

library(ggplot2)

ggplot(data = chartA, mapping = aes(x = X)) +
  geom_histogram(bins = 20, color = "white", fill = "steelblue") +
    xlab("Values of X") +
    ylab("Frequency of X Values") +
    ggtitle("Histogram of X with Normal Curve")

我究竟在哪里放置代码以获得正常曲线?

标签: rggplot2

解决方案


Tung 的答案可能是您想要的,但它实际上并没有创建正态曲线 - 它只是平滑直方图,但不假设结果将是正态分布。您可以使用观察到的平均值和标准差从正态分布中绘制密度stat_function()

# Adapting Tung's answer, adding the normal distribution density in purple
ggplot(data = chartA, mapping = aes(x = X)) +
    geom_histogram(aes(y = ..density..),
                   alpha = 0.8, bins = 20,
                   color = "white", fill = "steelblue",
                   position = "identity"
    ) +
    geom_density(alpha = .2) +
    stat_function(fun = function(x) {
        dnorm(x, mean = mean(chartA$X), sd = sd(chartA$X))
    }, colour = "purple") +
    scale_x_continuous(expand = c(0, 0)) +
    scale_y_continuous(expand = c(0, 0)) +
    xlab("Values of X") +
    ylab("Density") +
    ggtitle("Histogram of X with Normal Curve") +
    theme_classic(base_size = 14)

推荐阅读