首页 > 解决方案 > 在 R 中基于组和按行计算差异

问题描述

表格按“组”列中的两个因素细分:客户和客户支持。

我想计算每个给定月份和总字段之间的比率。

参考表 df(仅到 4 月,但完整表一直到 12 月):

Group            | Month  | Total

Customer           Jan       170
Customer Support   Jan       141
Customer           Feb       134
Customer Support   Feb       131
Customer           Mar       162
Customer Support   Mar       136
Customer           Apr       236
Customer Support   Apr       190

我想要创建一个新字段,显示客户支持(分子)与客户(分母)之间的响应比率。所以行计算,但从下往上。

期望的输出:

Group            | Month  | Total | Response Ratio

Customer           Jan       170    0.82
Customer Support   Jan       141    0.82
Customer           Feb       134    0.97
Customer Support   Feb       131    0.97
Customer           Mar       162    0.83
Customer Support   Mar       136    0.83
Customer           Apr       236    0.8
Customer Support   Apr       190    0.8

然后,这将允许我计算所有数据的“全球平均响应率”。

标签: rdplyr

解决方案


我们可以按“月”分组,并假设第一个元素是“每个“组的客户”

library(dplyr)
df1 %>%
    group_by(Month) %>%
    mutate(ResponseRatio = Total/first(Total), ResponseRatio = replace(ResponseRatio, 1, ResponseRatio[2])) %>%
    ungroup

-输出

# A tibble: 8 x 4
  Group            Month Total ResponseRatio
  <chr>            <chr> <int>         <dbl>
1 Customer         Jan     170         0.829
2 Customer Support Jan     141         0.829
3 Customer         Feb     134         0.978
4 Customer Support Feb     131         0.978
5 Customer         Mar     162         0.840
6 Customer Support Mar     136         0.840
7 Customer         Apr     236         0.805
8 Customer Support Apr     190         0.805

数据

df1 <- structure(list(Group = c("Customer", "Customer Support", "Customer", 
"Customer Support", "Customer", "Customer Support", "Customer", 
"Customer Support"), Month = c("Jan", "Jan", "Feb", "Feb", "Mar", 
"Mar", "Apr", "Apr"), Total = c(170L, 141L, 134L, 131L, 162L, 
136L, 236L, 190L)), class = "data.frame", row.names = c(NA, -8L
))

推荐阅读