首页 > 解决方案 > Change scale in geom_qq

问题描述

I'd like to get the numeric values of a variable (rather than z-score) in the x-axis using ggplot and geom_qq

library("ggplot2")
coin_prob <- 0.5 # this is a fair coin
tosses_per_test <- 5000 # we want to flip a coin 5000 times
no_of_tests <- 1000


outcomes <- rbinom(n = no_of_tests,
            size = tosses_per_test, 
            prob = coin_prob)/tosses_per_test

outcomes.df <- data.frame("results"= outcomes)

ggplot(outcomes.df, aes(sample = results)) +
  geom_qq() + 
  geom_qq_line(color="red") + 
  labs(x="Theoretical Data", title = "Simulated Coin toss", subtitle = "5000 tosses repeated 1000 times", y="Sample Outcomes")

The default in ggplot for the x-axis seems to be z-scores rather than raw theoretical values. I can hack around like this to get the "real" x axis

p <- ggplot(outcomes.df, aes(sample = results)) + geom_qq()
g <- ggplot_build(p)
raw_qs <- g$data[[1]]$theoretical*sd(outcomes.df$results) + mean(outcomes.df$results)

ggplot(outcomes.df, aes(sample = results)) +
  geom_qq() + 
  geom_qq_line(color="red") + 
  labs(x="Theoretical Data", title = "Simulated Coin toss", subtitle = "5000 tosses repeated 1000 times", y="Sample Outcomes") +
  scale_x_continuous(breaks=seq(-3,3,1), labels = round((seq(-3,3,1)*sd(outcomes.df$results) + mean(outcomes.df$results)),2))

But there's got to be something simpler

标签: rggplot2

解决方案


Set the parameters of the distribution such that the theoretical quantiles match the distribution to which you're comparing.



library("ggplot2")
coin_prob <- 0.5 # this is a fair coin
tosses_per_test <- 5000 # we want to flip a coin 5000 times
no_of_tests <- 1000

outcomes <- rbinom(
  n = no_of_tests,
  size = tosses_per_test, 
  prob = coin_prob) / tosses_per_test

## set dparams in _qq calls 
## so that we're not comparing against standard normal distn.
ggplot(mapping = aes(sample = outcomes)) +
  geom_qq(dparams = list(mean = mean(outcomes), sd = sd(outcomes))) +
  geom_qq_line(
    dparams = list(mean = mean(outcomes), sd = sd(outcomes)),
    color = "red"
  ) +
  labs(
    x = "Theoretical Data",
    title = "Simulated Coin toss",
    subtitle = "5000 tosses repeated 1000 times",
    y = "Sample Outcomes"
  )

You can also change the distribution entirely. For example, to compare against uniform quantiles (eg, p-values)

pvals <- replicate(1000, cor.test(rnorm(100), rnorm(100))$p.value)

ggplot(mapping = aes(sample = pvals)) +
  geom_qq(distribution = stats::qunif) +
  geom_qq_line(
    distribution = stats::qunif,
    color = "red"
  ) +
  labs(
    x = "Uniform quantiles",
    title = "p-values under the null",
    subtitle = "1,000 null correlation tests",
    y = "Observed p-value"
  )


推荐阅读