首页 > 解决方案 > 根据 R 中某些列的条件从几行计算一个值

问题描述

我有一个这样的数据集:

year      city       type   sex  number
2008      London      A      F    100
2008      London      B      F    110
2008      London      A      M    101
2008      London      B      M    111
2009      London      A      F    200
2009      London      B      F    210
2009      London      A      M    201
2009      London      B      M    211
2008      NY          A      F    100
2008      NY          B      F    110
2008      NY          A      M    101
2008      NY          B      M    111
2009      NY          A      F    200
2009      NY          B      F    210
2009      NY          A      M    201
2009      NY          B      M    211

我想以这样一种方式绘制它们,即每年我将 F 和 M 的总和作为堆栈图的两个部分并显示每个项目的百分比。

我怎样才能在 R 中做到这一点?

标签: rggplot2

解决方案


我们可以通过tidyverse方法做到这一点

  1. 按“年份”、“性别”列分组
  2. 获取sum“数字”中的summarise
  3. 通过将汇总除以列的来创建列“perc sum
  4. 指定x为“年份”、y“数字”之和、fill“性别”和“perc”为labelinaesggplot
  5. 用于geom_col返回条形图
  6. 添加百分比标签geom_text
library(dplyr)
library(ggplot2)
df1 %>% 
    group_by(year, sex) %>% 
    summarise(number = sum(number), .groups = 'drop') %>% 
    mutate(perc =  number/sum(number), year = factor(year)) %>% 
    ggplot(aes(x = year, y = number, fill = sex, 
            label = scales::percent(perc))) + 
      geom_col(position = 'dodge') + 
      geom_text(position = position_dodge(width = .9),  
              vjust = -0.5,   
               size = 3) +      
      theme_bw()

-输出

在此处输入图像描述

数据

df1 <- structure(list(year = c(2008L, 2008L, 2008L, 2008L, 2009L, 2009L, 
2009L, 2009L, 2008L, 2008L, 2008L, 2008L, 2009L, 2009L, 2009L, 
2009L), city = c("London", "London", "London", "London", "London", 
"London", "London", "London", "NY", "NY", "NY", "NY", "NY", "NY", 
"NY", "NY"), type = c("A", "B", "A", "B", "A", "B", "A", "B", 
"A", "B", "A", "B", "A", "B", "A", "B"), sex = c("F", "F", "M", 
"M", "F", "F", "M", "M", "F", "F", "M", "M", "F", "F", "M", "M"
), number = c(100L, 110L, 101L, 111L, 200L, 210L, 201L, 211L, 
100L, 110L, 101L, 111L, 200L, 210L, 201L, 211L)), 
class = "data.frame", row.names = c(NA, 
-16L))

推荐阅读