r - 在 R 中基于组和按行计算差异
问题描述
表格按“组”列中的两个因素细分:客户和客户支持。
我想计算每个给定月份和总字段之间的比率。
参考表 df(仅到 4 月,但完整表一直到 12 月):
Group | Month | Total
Customer Jan 170
Customer Support Jan 141
Customer Feb 134
Customer Support Feb 131
Customer Mar 162
Customer Support Mar 136
Customer Apr 236
Customer Support Apr 190
我想要创建一个新字段,显示客户支持(分子)与客户(分母)之间的响应比率。所以行计算,但从下往上。
期望的输出:
Group | Month | Total | Response Ratio
Customer Jan 170 0.82
Customer Support Jan 141 0.82
Customer Feb 134 0.97
Customer Support Feb 131 0.97
Customer Mar 162 0.83
Customer Support Mar 136 0.83
Customer Apr 236 0.8
Customer Support Apr 190 0.8
然后,这将允许我计算所有数据的“全球平均响应率”。
解决方案
我们可以按“月”分组,并假设第一个元素是“每个“组的客户”
library(dplyr)
df1 %>%
group_by(Month) %>%
mutate(ResponseRatio = Total/first(Total), ResponseRatio = replace(ResponseRatio, 1, ResponseRatio[2])) %>%
ungroup
-输出
# A tibble: 8 x 4
Group Month Total ResponseRatio
<chr> <chr> <int> <dbl>
1 Customer Jan 170 0.829
2 Customer Support Jan 141 0.829
3 Customer Feb 134 0.978
4 Customer Support Feb 131 0.978
5 Customer Mar 162 0.840
6 Customer Support Mar 136 0.840
7 Customer Apr 236 0.805
8 Customer Support Apr 190 0.805
数据
df1 <- structure(list(Group = c("Customer", "Customer Support", "Customer",
"Customer Support", "Customer", "Customer Support", "Customer",
"Customer Support"), Month = c("Jan", "Jan", "Feb", "Feb", "Mar",
"Mar", "Apr", "Apr"), Total = c(170L, 141L, 134L, 131L, 162L,
136L, 236L, 190L)), class = "data.frame", row.names = c(NA, -8L
))