r - 如何使密度直方图除以ggplot2中的第二个值?
问题描述
我在 ggplot2 中的密度直方图有问题。我在 RStudio 工作,我正在尝试根据个人职业创建收入密度直方图。我的问题是,当我使用我的代码时:
data = read.table("http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data",
sep=",",header=F,col.names=c("age", "type_employer", "fnlwgt", "education",
"education_num","marital", "occupation", "relationship", "race","sex",
"capital_gain", "capital_loss", "hr_per_week","country", "income"),
fill=FALSE,strip.white=T)
ggplot(data=dat, aes(x=income)) +
geom_histogram(stat='count',
aes(x= income, y=stat(count)/sum(stat(count)),
col=occupation, fill=occupation),
position='dodge')
作为响应,我得到每个值的直方图除以所有类别的所有值的总数,例如,我希望收入 > 50K 且职业是“工艺维修”的人除以职业是工艺维修的总人数,对于<=50K和相同的职业类别也是如此,对于所有其他类型的职业也是如此
第二个问题是,在做了适当的密度直方图之后,如何按降序对条形图进行排序?
解决方案
This is a situation where it makes sence to re-aggregate your data first, before plotting. Aggregating within the ggplot
call works fine for simple aggregations, but when you need to aggregate, then peel off a group for your second calculation, it doesn't work so well. Also, note that because your x axis is discrete, we don't use a histogram here, instead we'll use geom_bar()
First we aggregate by count, then calculate percent of total using occupation
as the group.
d2 <- data %>% group_by(income, occupation) %>%
summarize(count= n()) %>%
group_by(occupation) %>%
mutate(percent = count/sum(count))
Then simply plot a bar chart using geom_bar
and position = 'dodge'
so the bars are side by side, rather than stacked.
d2 %>% ggplot(aes(income, percent, fill = occupation)) +
geom_bar(stat = 'identity', position='dodge')
推荐阅读
- r - 有没有办法将特定数量的列转换为 R 中的行?
- python - 如何使用 python 和 selenium 在 Internet Explorer(IE) 模式下打开 Microsoft Edge?
- android - Jetpack 使用 Fragments 编写 NavHost
- drools - 无法将具有多个流规则的 KieSession 序列化为字节 []
- reactjs - 在 Emotion 中使用 ClassName 覆盖类
- python - 为什么 super().__init__() 不起作用?面向对象
- javascript - 从画布图像中删除背景
- python - Python Pandas:如何在两个固定日期之间以分钟级别创建随机时间戳
- python - 使用 PyPdf2 替换 pdf 中的文本
- linux - Ansible 变量为空