r - R Dataframe Average Group by last months over Users
问题描述
Suppose I have the next dataframe. How can I create a new "avg" column that is the result of averaging the last 2 dates ("date") for each group. The idea is to apply this to a dataset with hundreds of thousands of files, so performance is important. The function should contemplate a variable number of months (example 2 or 3 months) and be able to change between simple and medium average.
Thanks in advance.
table1<-data.frame(group=c(1,1,1,1,2,2,2,2),date=c(201903,201902,201901,201812,201903,201902,201901,201812),price=c(10,30,50,20,2,10,9,20))
group date price
1 1 201903 10
2 1 201902 30
3 1 201901 50
4 1 201812 20
5 2 201903 2
6 2 201902 10
7 2 201901 9
8 2 201812 20
result<-data.frame(group=c(1,1,1,1,2,2,2,2),date=c(201903,201902,201901,201812,201903,201902,201901,201812),price=c(10,30,50,20,2,10,9,20), avg = c(20, 40, 35, NA, 6, 9.5, 14.5, NA))
group date price avg
1 1 201903 10 20.0
2 1 201902 30 40.0
3 1 201901 50 35.0
4 1 201812 20 NA
5 2 201903 2 6.0
6 2 201902 10 9.5
7 2 201901 9 14.5
8 2 201812 20 NA
解决方案
首先对 data.frame 进行排序,以便每个组的日期升序
table1 <- table1[order(table1$group, table1$date), ]
创建一个带有月数参数的移动平均函数。其他可用的功能选项:计算移动平均线
mov_avg <- function(y, months = 2){as.numeric(filter(y, rep(1 / months, months), sides = 1))}
将此mov_avg
功能与经典的 do.call-lapply-split 组合一起使用
table1$avg_2months <- do.call(c, lapply(split(x=table1$price, f=table1$group), mov_avg, months=2))
table1$avg_3months <- do.call(c, lapply(split(x=table1$price, f=table1$group), mov_avg, months=3))
table1
group date price avg_2months avg_3months
4 1 201812 20 NA NA
3 1 201901 50 35.0 NA
2 1 201902 30 40.0 33.33333
1 1 201903 10 20.0 30.00000
8 2 201812 20 NA NA
7 2 201901 9 14.5 NA
6 2 201902 10 9.5 13.00000
5 2 201903 2 6.0 7.00000
推荐阅读
- c - 如何在函数中调用函数
- php - 如何在 DQL 中使用 RETURNING 更新查询
- ios - 如何在 Swift 中删除领域中的对象
- ruby-on-rails - 为什么“唯一性:真实”验证在我的测试(Rails)中不起作用?
- amazon-web-services - 错误:InvalidProfileError - 尽管有配置文件,但无法找到配置文件(默认)
- javascript - Uncaught Invariant Violation:最小化 React 错误
- python - 用 json.dump 引发 JSONDecodeError("Extra data", s, end)
- docker - 将 env var 从 docker run cmd 在 jenkinsfile 中传递给 dockerfie
- jquery - 如果找到数据则显示警报 - JQUERY
- monitoring - 在 Glowroot 中无法看到我的 Vertx 应用程序的任何 Web 事务