首页 > 解决方案 > 如何计算R中的移动平均线?

问题描述

我想计算重叠世代的移动平均值。例如,1915 年的值应包括 1900-1930 年的平均值,1916 年的值应包括 1901-1931 年的平均值等等。我在下面编写了以下函数和循环:

calc_mean = function(data_frame, yr, time_generation){
  df_MM = data_frame %>% 
  filter(yr >= year & yr < year + time_generation) %>% 
  summarize(school_mean = mean(school, na.rm = TRUE)) %>% 
  mutate(year = year + gen_interval/2)

return(df_MM)
}
time_generation = 30;

# Preallocation
df_mean = data.frame()


for(year in seq(from = 1900, to = 1960, by = 1)){

  df_MM = calc_mean(df_school, yr = year, time_generation)

  df_mean = rbind(df_mean, df_MM)
}

remove(df_MM)

但是,如果我对一个小样本进行交叉检查,我会得到错误的值。你看到我的错误了吗?

让我给你一个小样本,让你自己检查:

set.seed(2)
df_school <- data.frame(year = 1900:1960, val = sort(runif(61)))

标签: rloopsmoving-average

解决方案


假设您的数据没有空白,

set.seed(42)
x <- data.frame(year = 2000:2010, val = sort(runif(11)))

x$rollavg <- zoo::rollmean(x$val, k=3, fill=NA, align="center")
x$rollavg2 <- zoo::rollapply(x$val, FUN=mean, width=3, align="center", partial=TRUE)
x
#    year       val   rollavg  rollavg2
# 1  2000 0.1346666        NA 0.2104031
# 2  2001 0.2861395 0.2928493 0.2928493
# 3  2002 0.4577418 0.4209924 0.4209924
# 4  2003 0.5190959 0.5395277 0.5395277
# 5  2004 0.6417455 0.6059446 0.6059446
# 6  2005 0.6569923 0.6679342 0.6679342
# 7  2006 0.7050648 0.6995485 0.6995485
# 8  2007 0.7365883 0.7573669 0.7573669
# 9  2008 0.8304476 0.8272807 0.8272807
# 10 2009 0.9148060 0.8941097 0.8941097
# 11 2010 0.9370754        NA 0.9259407

rollavg可用数据太少时不提供统计数据的标准滚动平均值在哪里。rollavg2如果您想要不完整的平均值,则提供。


推荐阅读