首页 > 解决方案 > 如何计算 dplyr 中的滚动标准偏差

问题描述

我在 R 中有以下数据框。

data <- structure(list(Date = structure(c(18682, 18683, 18684, 18687, 
18688), class = "Date"), Apple = c(125.349998, 120.989998, 121.260002, 
127.790001, 125.120003), Amazon = c(3159.530029, 3057.159912, 
3092.929932, 3146.139893, 3094.530029), Facebook = c(264.309998, 
254.690002, 257.619995, 264.910004, 259), Google = c(2083.810059, 
2015.949951, 2021.910034, 2069.659912, 2064.47998), Netflix = c(553.409973, 
546.700012, 538.849976, 550.640015, 547.820007)), row.names = c(NA, 
-5L), class = c("tbl_df", "tbl", "data.frame"))

我正在寻找一种简单的方法来计算dplyr. 如果我的数据框是动物园对象,则解决方案可能看起来像这样

library(tidyverse)
data %>% mutate_at(.vars = vars(2:6), .funs = ~zoo::rollapply(., width = 2, FUN = sd))

关于如何调整.funs命令以处理 tbl 对象的任何想法?

标签: rdplyrtidyverse

解决方案


我喜欢这些slider包装:

data %>%
  mutate(across(-Date, ~slider::slide_dbl(.x, sd, .before = Inf)))
  # or use `.before = 2` if you want to look back at the
  #  two prior values (3 in total)

# A tibble: 5 x 6
  Date       Apple Amazon Facebook Google Netflix
  <date>     <dbl>  <dbl>    <dbl>  <dbl>   <dbl>
1 2021-02-24 NA      NA      NA      NA     NA   
2 2021-02-25  3.08   72.4     6.80   48.0    4.74
3 2021-02-26  2.44   52.0     4.93   37.6    7.29
4 2021-03-01  3.30   47.5     5.03   34.0    6.33
5 2021-03-02  2.91   42.1     4.40   30.3    5.49

slider还允许您按索引列调整窗口大小,因此您可以使用slider::slide_index_dbl().


推荐阅读