r - 根据 R 中某些列的条件从几行计算一个值
问题描述
我有一个这样的数据集:
year city type sex number
2008 London A F 100
2008 London B F 110
2008 London A M 101
2008 London B M 111
2009 London A F 200
2009 London B F 210
2009 London A M 201
2009 London B M 211
2008 NY A F 100
2008 NY B F 110
2008 NY A M 101
2008 NY B M 111
2009 NY A F 200
2009 NY B F 210
2009 NY A M 201
2009 NY B M 211
我想以这样一种方式绘制它们,即每年我将 F 和 M 的总和作为堆栈图的两个部分并显示每个项目的百分比。
我怎样才能在 R 中做到这一点?
解决方案
我们可以通过tidyverse
方法做到这一点
- 按“年份”、“性别”列分组
- 获取
sum
“数字”中的summarise
- 通过将汇总除以列的来创建列“perc
sum
” - 指定
x
为“年份”、y
“数字”之和、fill
“性别”和“perc”为label
inaes
ggplot
- 用于
geom_col
返回条形图 - 添加百分比标签
geom_text
library(dplyr)
library(ggplot2)
df1 %>%
group_by(year, sex) %>%
summarise(number = sum(number), .groups = 'drop') %>%
mutate(perc = number/sum(number), year = factor(year)) %>%
ggplot(aes(x = year, y = number, fill = sex,
label = scales::percent(perc))) +
geom_col(position = 'dodge') +
geom_text(position = position_dodge(width = .9),
vjust = -0.5,
size = 3) +
theme_bw()
-输出
数据
df1 <- structure(list(year = c(2008L, 2008L, 2008L, 2008L, 2009L, 2009L,
2009L, 2009L, 2008L, 2008L, 2008L, 2008L, 2009L, 2009L, 2009L,
2009L), city = c("London", "London", "London", "London", "London",
"London", "London", "London", "NY", "NY", "NY", "NY", "NY", "NY",
"NY", "NY"), type = c("A", "B", "A", "B", "A", "B", "A", "B",
"A", "B", "A", "B", "A", "B", "A", "B"), sex = c("F", "F", "M",
"M", "F", "F", "M", "M", "F", "F", "M", "M", "F", "F", "M", "M"
), number = c(100L, 110L, 101L, 111L, 200L, 210L, 201L, 211L,
100L, 110L, 101L, 111L, 200L, 210L, 201L, 211L)),
class = "data.frame", row.names = c(NA,
-16L))
推荐阅读
- matlab - 在图像上绘制角度线 - Matlab
- javascript - 将 xterm.js 集成到 Angular
- javascript - onChange(e) 中的“e”是什么?
- unix - 使用 AWK 读取文件的 1 行后跳过 2 行
- linux - 打印没有该列第一项的列
- ruby-on-rails - Rails 5.0:添加引用现有表的自连接表
- android - 我怎么解决这个问题。(错误:找不到字段的吸气剂。)
- google-bigquery - 如何在 Google Big Query 中展平一行中的值
- python-3.x - 在python中迭代时间戳
- vb.net - WebBrowser 中的 InnerHtml