r - 如何在R中计算前几年的总和?
问题描述
我在数据集中有一组变量——我想根据所有以前的年份简单地计算所有这些变量的运行总数(和运行平均值)。
为了显示。这就是我的数据的样子,包括我想要生成的总运行变量。
country year X1 X2 X3 X4 X5 running_total
Bahamas 1990 0 0 0 0 1 NA
Bahamas 1991 0 0 1 1 0 1
Bahamas 1992 1 1 0 0 1 3
Bahamas 1993 0 0 0 0 0 6
Bahamas 1994 1 1 0 1 1 6
Bahamas 1995 0 0 1 0 0 10
Bahamas 1996 0 1 0 1 0 11
Bahamas 1997 1 0 1 0 1 13
Bahamas 1998 0 1 0 1 0 16
Bahamas 1999 1 0 1 0 1 18
Bahamas 2000 0 1 0 1 0 21
Bahamas 2001 1 0 1 0 1 23
Bahamas 2002 0 1 0 1 0 26
Bahamas 2003 1 0 0 0 1 28
Bahamas 2004 0 0 0 1 0 30
Bahamas 2005 1 1 0 0 0 31
Bahamas 2006 0 0 1 1 1 33
Bahamas 2007 1 0 0 0 0 36
Bahamas 2008 0 0 1 1 1 37
Bahamas 2009 1 1 0 0 0 40
Bahamas 2010 0 0 1 1 1 42
Bahamas 2011 1 1 0 0 0 45
Bolivia 1990 0 0 0 0 0 NA
Bolivia 1991 0 0 1 1 0 0
Bolivia 1992 0 0 0 0 0 2
Bolivia 1993 0 0 1 0 0 2
Bolivia 1994 0 0 0 0 0 3
Bolivia 1995 0 0 0 0 0 3
Bolivia 1996 0 0 0 0 0 3
Bolivia 1997 0 0 0 0 0 3
Bolivia 1998 0 0 0 0 0 3
Bolivia 1999 0 0 0 0 0 3
Bolivia 2000 0 1 0 1 0 3
Bolivia 2001 0 0 0 0 0 5
Bolivia 2002 0 0 0 0 0 5
Bolivia 2003 0 0 0 0 0 5
Bolivia 2004 0 0 0 0 0 5
Bolivia 2005 0 0 0 0 0 5
Bolivia 2006 0 0 0 0 0 5
Bolivia 2007 0 0 0 0 0 5
Bolivia 2008 0 0 0 0 1 5
Bolivia 2009 0 0 0 0 0 6
Bolivia 2010 0 0 0 0 1 6
Bolivia 2011 0 0 0 0 0 7
从 1990 年开始 ==NA。例如,1991 年的运行总计基于 1990 年。1992 年的运行总计基于 1990-1991 年。1993 年的运行总计基于 1990-1992 年- 1994 年的运行总计基于 1990-1993 年。依此类推……直到 2011 年。然后它开始为新的国家 B 执行相同的程序。
我尝试了下面的代码,但它没有按我想要的方式工作。当然,我需要更好地指定它,但是如何?
DF$csum <- ave(DF$X1, DF$X2,DF$X3,DF$X4,DF$X5,FUN=cumsum)
另外,我想根据相同的逻辑生成运行平均值。
在这里的任何帮助将不胜感激!
结构(列表(国家=结构(c(1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .标签 = c("Bahamas", "Bolivia"), class = "factor"), year = c(1990L, 1991L, 1992L, 1993L, 1994L, 1995L, 1996L, 1997L, 1998L, 1999L, 2000L, 2001L, 2002L, 2003L,2004L,2005L,2006L,2007L,2008L,2008L,2009L,2010L,2011L,1990L,1990L,1992L,1992L,1993L,1993L,1994L,1994L,1995L,1996L,1997L,1997L,1998L,1998L,1998L,1999L,2000L,2000L,2000L,2001L,2002L,2003L,2004L,2004L,2004L,2004L,2004L,20055 2006L, 2007L, 2008L, 2009L, 2010L, 2011L), X1 = c(0L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L , 1L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,0L, 0L, 0L), X2 = c(0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 1L , 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), X3 = c(0L, 1L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), X4 = c(0L, 1L) , 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L , 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), X5 = c(1L, 0L, 1L, 0L, 1L, 0L,0L,1L,0L,1L,0L,1L,0L,1L,0L,0L,1L,0L,1L,0L,1L,0L,0L,0L,0L,0L,0L,0L,0L,0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L),running_total = c(NA, 1L, 3L, 6L, 6L, 10L, 11L, 13L, 16L, 18L, 21L, 23L, 26L, 28L, 30L, 31L, 33L, 36L, 37L, 40L, 42L, 45L, NA, 0L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 7L)), .Names = c("国家”,“年份”,“X1”,“X2”,“X3”,“X4”,“X5”,“running_total”),类 =“data.frame”,row.names = c(NA,-44L ))框架", row.names = c(NA, -44L))框架", row.names = c(NA, -44L))
解决方案
library(data.table)
setDT(df)
df[, xt := X1+X2+X3+X4+X5]
df[, rt2 := shift(cumsum(xt)), by = country]
实际上它可以用单线解决:
df[, rt3 := {xt=X1+X2+X3+X4+X5; shift(cumsum(xt))}, by = country]
# Or as Ryan points out:
df[, rt2 := shift(cumsum(Reduce(`+`, .SD))) , by = country , .SDcols = grep('^X.*', names(df), value = T)]
所有导致:
country year X1 X2 X3 X4 X5 running_total xt rt2
1: Bahamas 1990 0 0 0 0 1 NA 1 NA
2: Bahamas 1991 0 0 1 1 0 1 2 1
3: Bahamas 1992 1 1 0 0 1 3 3 3
4: Bahamas 1993 0 0 0 0 0 6 0 6
5: Bahamas 1994 1 1 0 1 1 6 4 6
6: Bahamas 1995 0 0 1 0 0 10 1 10
7: Bahamas 1996 0 1 0 1 0 11 2 11
8: Bahamas 1997 1 0 1 0 1 13 3 13
9: Bahamas 1998 0 1 0 1 0 16 2 16
10: Bahamas 1999 1 0 1 0 1 18 3 18
11: Bahamas 2000 0 1 0 1 0 21 2 21
12: Bahamas 2001 1 0 1 0 1 23 3 23
13: Bahamas 2002 0 1 0 1 0 26 2 26
14: Bahamas 2003 1 0 0 0 1 28 2 28
15: Bahamas 2004 0 0 0 1 0 30 1 30
16: Bahamas 2005 1 1 0 0 0 31 2 31
17: Bahamas 2006 0 0 1 1 1 33 3 33
18: Bahamas 2007 1 0 0 0 0 36 1 36
19: Bahamas 2008 0 0 1 1 1 37 3 37
20: Bahamas 2009 1 1 0 0 0 40 2 40
21: Bahamas 2010 0 0 1 1 1 42 3 42
22: Bahamas 2011 1 1 0 0 0 45 2 45
23: Bolivia 1990 0 0 0 0 0 NA 0 NA
24: Bolivia 1991 0 0 1 1 0 0 2 0
25: Bolivia 1992 0 0 0 0 0 2 0 2
26: Bolivia 1993 0 0 1 0 0 2 1 2
27: Bolivia 1994 0 0 0 0 0 3 0 3
28: Bolivia 1995 0 0 0 0 0 3 0 3
29: Bolivia 1996 0 0 0 0 0 3 0 3
30: Bolivia 1997 0 0 0 0 0 3 0 3
31: Bolivia 1998 0 0 0 0 0 3 0 3
32: Bolivia 1999 0 0 0 0 0 3 0 3
33: Bolivia 2000 0 1 0 1 0 3 2 3
34: Bolivia 2001 0 0 0 0 0 5 0 5
35: Bolivia 2002 0 0 0 0 0 5 0 5
36: Bolivia 2003 0 0 0 0 0 5 0 5
37: Bolivia 2004 0 0 0 0 0 5 0 5
38: Bolivia 2005 0 0 0 0 0 5 0 5
39: Bolivia 2006 0 0 0 0 0 5 0 5
40: Bolivia 2007 0 0 0 0 0 5 0 5
41: Bolivia 2008 0 0 0 0 1 5 1 5
42: Bolivia 2009 0 0 0 0 0 6 0 6
43: Bolivia 2010 0 0 0 0 1 6 1 6
44: Bolivia 2011 0 0 0 0 0 7 0 7
country year X1 X2 X3 X4 X5 running_total xt rt2
推荐阅读
- loops - 如何在谷歌工作表编辑器脚本中循环 getRange("B" + row + " :" + "J" + row)
- linux - 如何在 bash 中找到一些文件并将它们输入到另一个程序中?
- python-3.5 - AttributeError:模块 'shutil' 没有属性 'copy'
- c++ - 子窗口处于焦点时未调用 WinAPI 计时器回调
- hibernate - Hibernate PersistenceSet 和 Lombok EqualsAndHashCode 问题
- python - 熊猫:groupby 操作后如何取回原始索引
- javascript - 页面刷新后保持滚动位置并保存切换
- c++ - 在 C++ 中初始化变量
- c# - 从同步函数调用异步函数
- javascript - 如何在wordpress的if/else条件下使用PHP/Javascript隐藏类?