首页 > 解决方案 > R:分类变量值的比例直方图(而不是每个值的许多直方图)

问题描述

我有一个带有连续变量('amount')和分类变量('fate',它有两个值 - 好或坏)的数据框:

> loans[c('amount', 'fate')]
# A tibble: 34,655 x 2
   amount fate 
    <dbl> <chr>
 1   8000 Bad  
 2  11000 Good 
 3  20000 Good 
 4  10000 Bad  
 5  20000 Good 
 6  15250 Good 
 7   7800 Good 
 8  18000 Good 
 9   5600 Good 
10  24000 Bad  
# ... with 34,645 more rows

我可以很容易地绘制两个类别的计数或频率的直方图:

loans %>% gf_histogram(..count.. ~ amount) %>% gf_facet_grid(fate ~ .)

结果:

在此处输入图像描述

但这不是我想要的。相反,对于每个桶,我想绘制好或坏值相对于该桶中所有出现的比例(好 + 坏)。

换句话说,我想要每个桶中的坏/(好+坏)数据点的直方图,而不是好和坏的单独直方图。

理想情况下,这应该在对数据框进行尽可能少的转换时发生。

我怎么做?

编辑:这是一个可重现的例子:

library(ggformula)
library(dplyr)

tt <- structure(list(amount = c(8000, 11000, 20000, 10000, 20000, 15250, 
7800, 18000, 5600, 24000, 3600, 10000, 16000, 35000, 17225, 19500, 
4000, 9475, 24000, 6000, 8000, 13100, 4700, 10000, 18500, 20800, 
9000, 17500, 23550, 21000, 20950, 16975, 12000, 9000, 12000, 
3850, 24175, 2250, 14425, 7200, 6150, 30000, 12000, 10000, 8000, 
2400, 10000, 35000, 27050, 8500, 9600, 5650, 5600, 28000, 8325, 
3000, 16800, 2000, 21000, 4500, 14525, 15000, 20000, 22475, 2200, 
15000, 30000, 15000, 9000, 12000, 13000, 2000, 10000, 33600, 
20000, 18000, 15000, 35000, 20000, 24000, 12500, 5000, 13475, 
12000, 18625, 1200, 13000, 12000, 12000, 17000, 22800, 17000, 
29975, 16750, 18000, 9000, 8400, 15000, 10000, 21000), fate = c("Bad", 
"Good", "Good", "Bad", "Good", "Good", "Good", "Good", "Good", 
"Bad", "Bad", "Good", "Good", "Good", "Good", "Good", "Good", 
"Good", "Good", "Bad", "Good", "Good", "Bad", "Bad", "Bad", "Good", 
"Good", "Bad", "Bad", "Good", "Bad", "Bad", "Good", "Good", "Good", 
"Good", "Bad", "Good", "Bad", "Good", "Good", "Bad", "Good", 
"Good", "Good", "Good", "Good", "Bad", "Good", "Good", "Bad", 
"Bad", "Good", "Bad", "Good", "Good", "Bad", "Good", "Good", 
"Good", "Good", "Good", "Bad", "Good", "Good", "Good", "Good", 
"Good", "Good", "Good", "Good", "Good", "Good", "Bad", "Bad", 
"Good", "Good", "Bad", "Good", "Bad", "Good", "Bad", "Good", 
"Good", "Good", "Good", "Bad", "Good", "Bad", "Good", "Good", 
"Good", "Good", "Good", "Good", "Good", "Good", "Bad", "Bad", 
"Good")), row.names = c(NA, -100L), class = c("tbl_df", "tbl", 
"data.frame"))

tt %>% gf_histogram(..density.. ~ amount) %>% gf_facet_grid(fate ~ .)

标签: r

解决方案


你在寻找这样的东西吗?

library(tibble)
library(ggplot2)
library(scales)

tt <- structure(list(amount = c(8000, 11000, 20000, 10000, 20000, 15250, 
7800, 18000, 5600, 24000, 3600, 10000, 16000, 35000, 17225, 19500, 
4000, 9475, 24000, 6000, 8000, 13100, 4700, 10000, 18500, 20800, 
9000, 17500, 23550, 21000, 20950, 16975, 12000, 9000, 12000, 
3850, 24175, 2250, 14425, 7200, 6150, 30000, 12000, 10000, 8000, 
2400, 10000, 35000, 27050, 8500, 9600, 5650, 5600, 28000, 8325, 
3000, 16800, 2000, 21000, 4500, 14525, 15000, 20000, 22475, 2200, 
15000, 30000, 15000, 9000, 12000, 13000, 2000, 10000, 33600, 
20000, 18000, 15000, 35000, 20000, 24000, 12500, 5000, 13475, 
12000, 18625, 1200, 13000, 12000, 12000, 17000, 22800, 17000, 
29975, 16750, 18000, 9000, 8400, 15000, 10000, 21000), fate = c("Bad", 
"Good", "Good", "Bad", "Good", "Good", "Good", "Good", "Good", 
"Bad", "Bad", "Good", "Good", "Good", "Good", "Good", "Good", 
"Good", "Good", "Bad", "Good", "Good", "Bad", "Bad", "Bad", "Good", 
"Good", "Bad", "Bad", "Good", "Bad", "Bad", "Good", "Good", "Good", 
"Good", "Bad", "Good", "Bad", "Good", "Good", "Bad", "Good", 
"Good", "Good", "Good", "Good", "Bad", "Good", "Good", "Bad", 
"Bad", "Good", "Bad", "Good", "Good", "Bad", "Good", "Good", 
"Good", "Good", "Good", "Bad", "Good", "Good", "Good", "Good", 
"Good", "Good", "Good", "Good", "Good", "Good", "Bad", "Bad", 
"Good", "Good", "Bad", "Good", "Bad", "Good", "Bad", "Good", 
"Good", "Good", "Good", "Bad", "Good", "Bad", "Good", "Good", 
"Good", "Good", "Good", "Good", "Good", "Good", "Bad", "Bad", 
"Good")), row.names = c(NA, -100L), class = c("tbl_df", "tbl", 
"data.frame"))

tt %>% 
  ggplot(aes(x = amount, fill = fate)) +
  geom_histogram(position = "fill", bins = 30) +
  scale_x_continuous(labels = comma) +
  labs(
    y = "Proportion",
    x = "Amount",
    fill = "Fate"
  )

在此处输入图像描述

reprex 包(v0.3.0)于 2020 年 11 月 12 日创建


推荐阅读