首页 > 解决方案 > 计算 R 中 data.table 中的滞后百分比差异

问题描述

我正在尝试计算计数变量之间的滞后差异。但是,我的数据有一个组变量。我希望分别为每个组计算滞后。

到目前为止,我有以下内容:

dput(head(mydata,20))
structure(list(startYear = structure(c(1L, 2L, 3L, 4L, 5L, 1L, 
2L, 3L, 4L, 5L, 1L, 1L, 3L, 2L, 3L, 1L, 1L, 1L, 2L, 3L), .Label = c("2014", 
"2015", "2016", "2017", "2018"), class = "factor"), groupID = c("AISAC-0000", 
"AISAC-0000", "AISAC-0000", "AISAC-0000", "AISAC-0000", "ASSAT-0000", 
"ASSAT-0000", "ASSAT-0000", "ASSAT-0000", "ASSAT-0000", "BAYER-0001", 
"BAYSC-0002", "GECER-0002", "HANIN-0000", "HANIN-0000", "HOCED-0001", 
"HOCEN-0000", "INDAL-0000", "INDAL-0000", "INDAL-0000"), N = c(82, 
124, 60, 164, 65, 142, 183, 142, 75, 185, 145, 22, 162, 92, 4, 
166, 57, 11, 199, 137)), row.names = c(NA, -20L), class = c("data.table", 
"data.frame"))
mydata <- mydata[ ,var_calc := paste0(round((N/lag(N) - 1) * 100, digits = 3) , " %")]

所需的输出是:

mydata %>%
group_by(groupID) %>%
arrange(startYear,  .by_group = TRUE) %>%
  mutate(var_calc := paste0(round((N/lag(N) - 1) * 100, digits = 3) , " %")) 

什么是.by_group = TRUE替代品data.table

我怎样才能强迫积极的价值观有一个+

标签: rdata.table

解决方案


您可以在 中执行以下操作data.table,使用by分组而shift不是滞后:

编辑:添加功能以获得更好的百分比和加号输出

library(data.table)
mydata <- structure(list(startYear = structure(c(1L, 2L, 3L, 4L, 5L, 1L, 
                                       2L, 3L, 4L, 5L, 1L, 1L, 3L, 2L, 3L, 1L, 1L, 1L, 2L, 3L), .Label = c("2014", 
                                                                                                           "2015", "2016", "2017", "2018"), class = "factor"), groupID = c("AISAC-0000", 
                                                                                                                                                                           "AISAC-0000", "AISAC-0000", "AISAC-0000", "AISAC-0000", "ASSAT-0000", 
                                                                                                                                                                           "ASSAT-0000", "ASSAT-0000", "ASSAT-0000", "ASSAT-0000", "BAYER-0001", 
                                                                                                                                                                           "BAYSC-0002", "GECER-0002", "HANIN-0000", "HANIN-0000", "HOCED-0001", 
                                                                                                                                                                           "HOCEN-0000", "INDAL-0000", "INDAL-0000", "INDAL-0000"), N = c(82, 
                                                                                                                                                                                                                                          124, 60, 164, 65, 142, 183, 142, 75, 185, 145, 22, 162, 92, 4, 
                                                                                                                                                                                                                                          166, 57, 11, 199, 137)), row.names = c(NA, -20L), class = c("data.table", 
                                                                                                                                                                                                                                                                                                      "data.frame"))

setDT(mydata)

addplus <- function(x, digits=3){
  x <- setNames(x, round(100*x, digits = digits))
  ifelse(is.na(x), x, 
         ifelse(sign(x) == 1, paste0("+", names(x), "%"), paste0(names(x), "%"))
  )
}

mydata[ , var_calc := addplus(N/shift(N) - 1), by="groupID"][]
#>     startYear    groupID   N   var_calc
#>  1:      2014 AISAC-0000  82       <NA>
#>  2:      2015 AISAC-0000 124    +51.22%
#>  3:      2016 AISAC-0000  60   -51.613%
#>  4:      2017 AISAC-0000 164  +173.333%
#>  5:      2018 AISAC-0000  65   -60.366%
#>  6:      2014 ASSAT-0000 142       <NA>
#>  7:      2015 ASSAT-0000 183   +28.873%
#>  8:      2016 ASSAT-0000 142   -22.404%
#>  9:      2017 ASSAT-0000  75   -47.183%
#> 10:      2018 ASSAT-0000 185  +146.667%
#> 11:      2014 BAYER-0001 145       <NA>
#> 12:      2014 BAYSC-0002  22       <NA>
#> 13:      2016 GECER-0002 162       <NA>
#> 14:      2015 HANIN-0000  92       <NA>
#> 15:      2016 HANIN-0000   4   -95.652%
#> 16:      2014 HOCED-0001 166       <NA>
#> 17:      2014 HOCEN-0000  57       <NA>
#> 18:      2014 INDAL-0000  11       <NA>
#> 19:      2015 INDAL-0000 199 +1709.091%
#> 20:      2016 INDAL-0000 137   -31.156%

reprex 包(v0.3.0)于 2020-05-12 创建


推荐阅读