首页 > 解决方案 > 如何使用 dplyr lag() 平滑变量中的微小变化

问题描述

我已经对数据和一个变量进行了分组,我想对每组进行平滑处理。如果绝对变化很小(例如小于 5),我认为它们是测量误差,因此想要复制(前滚)旧值。在每个组中,我将第一个测量值初始化为默认值。因此,我假设每组的第一次观察总是正确的(直到辩论)。

set.seed(5)
mydata = data.frame(group=c(1,1,1,1,1,1,1,2,2,2,2,2,2,2), 
                       year=seq(from=2003, to=2009, by=1), 
                       variable = round(runif(14, min = -5, max = 15),0))
mydata %>%
  filter(variable > 0) %>%
  group_by(group) %>%
  mutate(smooth5 = ifelse( abs( lag(variable, n = 1, default = first(variable)) - variable ) <= 5 , variable, 5)) %>%       
  select(group, year, variable, smooth5) %>%
  arrange(group)

# A tibble: 10 x 4
# Groups:   group [2]
   group  year variable smooth5
   <dbl> <dbl>    <dbl>   <dbl>
 1     1  2004        9       9
 2     1  2005       13      13  # <- this change is |4|, thus it should use the old value 9
 3     1  2006        1       5  # <- here 13 changes to 1 is a reasonable change, should keep 1
 4     1  2008        9       5
 5     1  2009        6       6
 6     2  2003       11      11
 7     2  2004       14      14
 8     2  2007        5       5
 9     2  2008        1       1
10     2  2009        6       6

标签: rdplyrsmoothing

解决方案


你很接近,但你的ifelse()电话有一些错误。下面,为了清楚起见,我添加了一个新变量previous。如果abs(previous - variable) <= 5,你想要previous,否则你想要variable

mydata %>%
  filter(variable > 0) %>%
  group_by(group) %>%
  mutate(previous = lag(variable, n = 1, default = first(variable)),
         smooth5 = ifelse(abs(previous - variable) <= 5, previous, variable)) %>%       
  select(group, year, variable, smooth5) %>%
  arrange(group)

这使

# A tibble: 10 x 4
# Groups:   group [2]
   group  year variable smooth5
   <dbl> <dbl>    <dbl>   <dbl>
 1     1  2004        9       9
 2     1  2005       13       9
 3     1  2006        1       1
 4     1  2008        9       9
 5     1  2009        6       9
 6     2  2003       11      11
 7     2  2004       14      11
 8     2  2007        5       5
 9     2  2008        1       5
10     2  2009        6       1

推荐阅读