首页 > 解决方案 > R 数据表创建前五年的平均值

问题描述

我有一个变量的数据,我想获得前 5 年同月同一变量的当前水平和平均值之间的差异。

library(tidyverse)
library(data.table)
library(lubridate)
MWE <- as.data.table(ggplot2::economics) %>%
  .[,c("pce","psavert","uempmed","unemploy"):=NULL]

> MWE
           date      pop
  1: 1967-07-01 198712.0
  2: 1967-08-01 198911.0
  3: 1967-09-01 199113.0
  4: 1967-10-01 199311.0
  5: 1967-11-01 199498.0
 ---                    
570: 2014-12-01 319746.2
571: 2015-01-01 319928.6
572: 2015-02-01 320074.5
573: 2015-03-01 320230.8
574: 2015-04-01 320402.3

我可以按月完成,但我无法将对当前行的引用合并到执行类似的操作year(date) < year(currentline) & year(date) >= year(currentline)-6

MWE_2 <- MWE[,MeanPastYears:=mean(pop),by=month(date)]

我想要的输出是

       date      pop      avg_5yrs
  1: 1967-07-01 198712.0     NA
  2: 1967-08-01 198911.0     NA
  3: 1967-09-01 199113.0     NA
  4: 1967-10-01 199311.0     NA
  5: 1967-11-01 199498.0     NA
 ---                    
570: 2014-12-01 319746.2   313013.8
571: 2015-01-01 319928.6   313192.1
572: 2015-02-01 320074.5   313350.7
573: 2015-03-01 320230.8   313511.2
574: 2015-04-01 320402.3   313640.3

标签: rdatedata.tabledata-manipulationlubridate

解决方案


里面的列[可以被索引为向量,所以我们首先为每一行创建一个向量,year(date) < year(date[..I]) & year(date) >= year(date[..I]) - 6当日期在区间内时为真,然后pop按月份获取平均值:

df[,
    year:=year(date)
][, 
    avg_5yrs := sapply(1:.N, function(..I) mean(pop[year < year[..I] & year >= year[..I] -6])), by=month(date)
][, year:=NULL][]
            date      pop avg_5yrs
  1: 1967-07-01 198712.0      NaN
  2: 1967-08-01 198911.0      NaN
  3: 1967-09-01 199113.0      NaN
  4: 1967-10-01 199311.0      NaN
  5: 1967-11-01 199498.0      NaN
 ---                             
570: 2014-12-01 319746.2 311845.5
571: 2015-01-01 319928.6 312028.1
572: 2015-02-01 320074.5 312192.6
573: 2015-03-01 320230.8 312357.4
574: 2015-04-01 320402.3 312498.1

推荐阅读