r - R:分类变量值的比例直方图(而不是每个值的许多直方图)
问题描述
我有一个带有连续变量('amount')和分类变量('fate',它有两个值 - 好或坏)的数据框:
> loans[c('amount', 'fate')]
# A tibble: 34,655 x 2
amount fate
<dbl> <chr>
1 8000 Bad
2 11000 Good
3 20000 Good
4 10000 Bad
5 20000 Good
6 15250 Good
7 7800 Good
8 18000 Good
9 5600 Good
10 24000 Bad
# ... with 34,645 more rows
我可以很容易地绘制两个类别的计数或频率的直方图:
loans %>% gf_histogram(..count.. ~ amount) %>% gf_facet_grid(fate ~ .)
结果:
但这不是我想要的。相反,对于每个桶,我想绘制好或坏值相对于该桶中所有出现的比例(好 + 坏)。
换句话说,我想要每个桶中的坏/(好+坏)数据点的直方图,而不是好和坏的单独直方图。
理想情况下,这应该在对数据框进行尽可能少的转换时发生。
我怎么做?
编辑:这是一个可重现的例子:
library(ggformula)
library(dplyr)
tt <- structure(list(amount = c(8000, 11000, 20000, 10000, 20000, 15250,
7800, 18000, 5600, 24000, 3600, 10000, 16000, 35000, 17225, 19500,
4000, 9475, 24000, 6000, 8000, 13100, 4700, 10000, 18500, 20800,
9000, 17500, 23550, 21000, 20950, 16975, 12000, 9000, 12000,
3850, 24175, 2250, 14425, 7200, 6150, 30000, 12000, 10000, 8000,
2400, 10000, 35000, 27050, 8500, 9600, 5650, 5600, 28000, 8325,
3000, 16800, 2000, 21000, 4500, 14525, 15000, 20000, 22475, 2200,
15000, 30000, 15000, 9000, 12000, 13000, 2000, 10000, 33600,
20000, 18000, 15000, 35000, 20000, 24000, 12500, 5000, 13475,
12000, 18625, 1200, 13000, 12000, 12000, 17000, 22800, 17000,
29975, 16750, 18000, 9000, 8400, 15000, 10000, 21000), fate = c("Bad",
"Good", "Good", "Bad", "Good", "Good", "Good", "Good", "Good",
"Bad", "Bad", "Good", "Good", "Good", "Good", "Good", "Good",
"Good", "Good", "Bad", "Good", "Good", "Bad", "Bad", "Bad", "Good",
"Good", "Bad", "Bad", "Good", "Bad", "Bad", "Good", "Good", "Good",
"Good", "Bad", "Good", "Bad", "Good", "Good", "Bad", "Good",
"Good", "Good", "Good", "Good", "Bad", "Good", "Good", "Bad",
"Bad", "Good", "Bad", "Good", "Good", "Bad", "Good", "Good",
"Good", "Good", "Good", "Bad", "Good", "Good", "Good", "Good",
"Good", "Good", "Good", "Good", "Good", "Good", "Bad", "Bad",
"Good", "Good", "Bad", "Good", "Bad", "Good", "Bad", "Good",
"Good", "Good", "Good", "Bad", "Good", "Bad", "Good", "Good",
"Good", "Good", "Good", "Good", "Good", "Good", "Bad", "Bad",
"Good")), row.names = c(NA, -100L), class = c("tbl_df", "tbl",
"data.frame"))
tt %>% gf_histogram(..density.. ~ amount) %>% gf_facet_grid(fate ~ .)
解决方案
你在寻找这样的东西吗?
library(tibble)
library(ggplot2)
library(scales)
tt <- structure(list(amount = c(8000, 11000, 20000, 10000, 20000, 15250,
7800, 18000, 5600, 24000, 3600, 10000, 16000, 35000, 17225, 19500,
4000, 9475, 24000, 6000, 8000, 13100, 4700, 10000, 18500, 20800,
9000, 17500, 23550, 21000, 20950, 16975, 12000, 9000, 12000,
3850, 24175, 2250, 14425, 7200, 6150, 30000, 12000, 10000, 8000,
2400, 10000, 35000, 27050, 8500, 9600, 5650, 5600, 28000, 8325,
3000, 16800, 2000, 21000, 4500, 14525, 15000, 20000, 22475, 2200,
15000, 30000, 15000, 9000, 12000, 13000, 2000, 10000, 33600,
20000, 18000, 15000, 35000, 20000, 24000, 12500, 5000, 13475,
12000, 18625, 1200, 13000, 12000, 12000, 17000, 22800, 17000,
29975, 16750, 18000, 9000, 8400, 15000, 10000, 21000), fate = c("Bad",
"Good", "Good", "Bad", "Good", "Good", "Good", "Good", "Good",
"Bad", "Bad", "Good", "Good", "Good", "Good", "Good", "Good",
"Good", "Good", "Bad", "Good", "Good", "Bad", "Bad", "Bad", "Good",
"Good", "Bad", "Bad", "Good", "Bad", "Bad", "Good", "Good", "Good",
"Good", "Bad", "Good", "Bad", "Good", "Good", "Bad", "Good",
"Good", "Good", "Good", "Good", "Bad", "Good", "Good", "Bad",
"Bad", "Good", "Bad", "Good", "Good", "Bad", "Good", "Good",
"Good", "Good", "Good", "Bad", "Good", "Good", "Good", "Good",
"Good", "Good", "Good", "Good", "Good", "Good", "Bad", "Bad",
"Good", "Good", "Bad", "Good", "Bad", "Good", "Bad", "Good",
"Good", "Good", "Good", "Bad", "Good", "Bad", "Good", "Good",
"Good", "Good", "Good", "Good", "Good", "Good", "Bad", "Bad",
"Good")), row.names = c(NA, -100L), class = c("tbl_df", "tbl",
"data.frame"))
tt %>%
ggplot(aes(x = amount, fill = fate)) +
geom_histogram(position = "fill", bins = 30) +
scale_x_continuous(labels = comma) +
labs(
y = "Proportion",
x = "Amount",
fill = "Fate"
)
由reprex 包(v0.3.0)于 2020 年 11 月 12 日创建
推荐阅读
- curl - unexpectedly website returns redirection to itself
- python - Where to get the raw html code in dash/layout/html to print local pdf after executing dash app?
- statistical-test - What statistical test should I use?
- c++ - 多维动态数组的内存丢失
- php - How to draw BarChart with MySQL data in React native?
- r - Invalid factor level, NA generated (if-else statement won't work) in R
- java - Java 日历中的前一天不正确
- perl - how do parsing DynamicWebsite data
- postgresql - ALTERING MULTIPLE TABLES IN POSTGRES
- autohotkey - 在循环中无法执行任何其他操作