首页 > 解决方案 > 计算一个最小项目值,得到一个分组的平均阈值

问题描述

我有一个数据样本,其值按 ID 和项目分组,我需要确定增加单个项目所需的最小值,以满足 ID 的总体平均阈值 0.90。

数据:

structure(list(ID = structure(c(1L, 2L, 2L), .Label = c("A1", 
"A2"), class = "factor"), Item = structure(c(1L, 2L, 1L), .Label = c("Item1", 
"Item2"), class = "factor"), Value.1 = c(0.7894, 0.95, 0.7894
), CurrentAvg = c(0.7894, 0.8697, 0.8697)), class = "data.frame", row.names = c(NA, 
-3L))

我可以通过以下语法获得每个项目的差异值:

library(dplyr)
SampDF2<-SampDF %>% 
group_by(ID,Item,CurrentAvg) %>% 
mutate(Value.1.Increase = 0.90-Value.1)

结果:

structure(list(ID = structure(c(1L, 2L, 2L), .Label = c("A1", 
"A2"), class = "factor"), Item = structure(c(1L, 2L, 1L), .Label = c("Item1", 
"Item2"), class = "factor"), Value.1 = c(0.7894, 0.95, 0.7894
), CurrentAvg = c(0.7894, 0.8697, 0.8697), Value.1.Increase = c(0.1106, 
-0.0499999999999999, 0.1106)), class = c("grouped_df", "tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -3L), vars = c("ID", 
"Item", "CurrentAvg"), labels = structure(list(ID = structure(c(1L, 
2L, 2L), .Label = c("A1", "A2"), class = "factor"), Item = structure(c(1L, 
1L, 2L), .Label = c("Item1", "Item2"), class = "factor"), CurrentAvg = c(0.7894, 
0.8697, 0.8697)), class = "data.frame", row.names = c(NA, -3L
), vars = c("ID", "Item", "CurrentAvg"), drop = TRUE), indices = list(
0L, 2L, 1L), drop = TRUE, group_sizes = c(1L, 1L, 1L), biggest_group_size = 1L)

但这个结果对于将 CurrentAvg by ID 增加到 0.90 阈值的项目值增加是不正确的。

有没有办法做到这一点并添加两个新列(一个显示值增加的值增加列和一个确认新平均值符合 0.90 阈值的 NewAvg 列)?

如果我的手动计算正确,这将是理想的结果:

structure(list(ID = structure(c(1L, 2L, 2L), .Label = c("A1", 
"A2"), class = "factor"), Item = structure(c(1L, 2L, 1L), .Label = c("Item1", 
"Item2"), class = "factor"), Value.1 = c(0.7894, 0.95, 0.8393
), CurrentAvg = c(0.7894, 0.8697, 0.8697), ValueIncrease = c(0.1106, 
0.04999, 0.04999), NewAvg = c(0.9, 0.89465, 0.89465)), class = "data.frame",  row.names = c(NA, 
-3L))

标签: rdplyrdata.table

解决方案


推荐阅读