首页 > 解决方案 > 如何将数据框列表转换为一个动物园对象R

问题描述

我有一个要转换为单个动物园对象的数据框列表。

列表示例:

> example
$A.N
# A tibble: 374 x 21
   TIMESTAMP            OPEN  HIGH   LOW CLOSE daily_return intraday_return RIC  
   <dttm>              <dbl> <dbl> <dbl> <dbl>        <dbl>           <dbl> <chr>
 1 2004-04-27 00:00:00  19.6  19.9  19.3  19.4            0        -0.00997 A.N  
 2 2004-04-28 00:00:00  19.3  19.3  19.0  19.1            0        -0.0105  A.N  
 3 2004-04-29 00:00:00  19.0  19.1  18.4  18.7            0        -0.0124  A.N  
 4 2004-04-30 00:00:00  18.8  18.9  18.1  18.2            0        -0.0302  A.N  
 5 2004-05-03 00:00:00  18.2  18.6  18.1  18.4            0         0.00776 A.N  
 6 2004-05-04 00:00:00  18.5  18.5  17.5  18.0            0        -0.0262  A.N  
 7 2004-05-05 00:00:00  18.0  18.3  17.9  18.1            0         0.00337 A.N  
 8 2004-05-06 00:00:00  17.9  18.0  17.7  17.7            0        -0.00977 A.N  
 9 2004-05-07 00:00:00  17.7  18.0  17.6  17.7            0         0.00420 A.N  
10 2004-05-10 00:00:00  17.4  17.5  16.9  17.1            0        -0.0170  A.N  
# ... with 364 more rows, and 13 more variables: Acquirer Ultimate Parent (At Deal) <lgl>,
#   Acquirer Ultimate Parent Country <lgl>, Acquirer Ultimate Parent Stock Exchange <lgl>,
#   Acquirer Ultimate Parent Ticker <lgl>, Acquirer FactSet ID <chr>, Acquirer <chr>,
#   Acquirer Ownership Type <chr>, Acquirer Country <chr>, Acquirer Stock Exchange <chr>,
#   Acquirer Ticker <chr>, Announcement Date <date>, Start_Event_Study <date>,
#   End_Event_Study <date>

$ABI.BR
# A tibble: 375 x 21
   TIMESTAMP            OPEN  HIGH   LOW CLOSE daily_return intraday_return RIC   
   <dttm>              <dbl> <dbl> <dbl> <dbl>        <dbl>           <dbl> <chr> 
 1 2002-11-04 00:00:00  14.0  14.3  13.2  13.3            0       -0.0473   ABI.BR
 2 2002-11-05 00:00:00  13.4  13.4  12.9  13.2            0       -0.0158   ABI.BR
 3 2002-11-06 00:00:00  13.7  14.0  13.5  14.0            0        0.0256   ABI.BR
 4 2002-11-07 00:00:00  14.0  14.4  13.7  13.7            0       -0.0192   ABI.BR
 5 2002-11-08 00:00:00  13.9  13.9  13.3  13.4            0       -0.0311   ABI.BR
 6 2002-11-11 00:00:00  13.4  14.0  13.4  13.9            0        0.0393   ABI.BR
 7 2002-11-12 00:00:00  13.8  14.3  13.7  14.1            0        0.0181   ABI.BR
 8 2002-11-13 00:00:00  13.8  13.9  13.5  13.7            0       -0.00950  ABI.BR
 9 2002-11-14 00:00:00  13.7  13.9  13.3  13.4            0       -0.0228   ABI.BR
10 2002-11-15 00:00:00  13.6  13.7  13.4  13.6            0       -0.000459 ABI.BR
# ... with 365 more rows, and 13 more variables: Acquirer Ultimate Parent (At Deal) <lgl>,
#   Acquirer Ultimate Parent Country <lgl>, Acquirer Ultimate Parent Stock Exchange <lgl>,
#   Acquirer Ultimate Parent Ticker <lgl>, Acquirer FactSet ID <chr>, Acquirer <chr>,
#   Acquirer Ownership Type <chr>, Acquirer Country <chr>, Acquirer Stock Exchange <chr>,
#   Acquirer Ticker <chr>, Announcement Date <date>, Start_Event_Study <date>,
#   End_Event_Study <date>

所以,我需要提取的只是 TIMESTAMP 和 INTRADAY_RETURN。我可以通过循环来做到这一点。为了进一步计算,我需要一个看起来像这样的大型动物园对象:

head(StockPriceReturns,3) # Time series of dates and returns.
Bajaj.Auto BHEL Bharti.Airtel Cipla Coal.India Dr.Reddy
2010-07-01 0.5277396 -1.236944 0.51151007 -0.7578608 NA -0.8436534
2010-07-02 -1.7309383 -1.669938 0.09443763 0.4910359 NA -0.3687345
2010-07-05 -0.2530097 -1.282136 0.80850304 0.1335015 NA 1.7035363

(这个例子来自 eventstudies 包)

TIMESTAMPS 和行数等在我的数据框列表中有所不同。

关于如何做到这一点的任何建议?

标签: rlistdplyrzoo

解决方案


假设在最后的注释中可重现地显示输入列表 rbind 将组件组合在一起以形成一个长数据框,然后提取所需的列并使用read.zoo.

aggregate=参数read.zoo提供了一个函数,该函数用于聚合具有相同日期时间的值,以便在ticker 中每个日期时间只有一个。聚合参数的常见值为aggregate=meanaggregate=function(x) tail(x, 1)。我们在下面展示第一个。对于 Note 中的数据,日期时间在 ticker 中是唯一的,因此可以选择省略聚合参数,尽管如果将其留在其中也不会受到影响。

library(zoo)

DF <- do.call("rbind", L)[c("TIMESTAMP", "RIC", "intraday_return")]
z <- read.zoo(DF, split = "RIC", aggregate = mean); z

给予:

                A.N    ABI.BR
2002-11-04       NA -0.047300
2002-11-05       NA -0.015800
2002-11-06       NA  0.025600
2002-11-07       NA -0.019200
2002-11-08       NA -0.031100
2002-11-11       NA  0.039300
2002-11-12       NA  0.018100
2002-11-13       NA -0.009500
2002-11-14       NA -0.022800
2002-11-15       NA -0.000459
2004-04-27 -0.00997        NA
2004-04-28 -0.01050        NA
2004-04-29 -0.01240        NA
2004-04-30 -0.03020        NA
2004-05-03  0.00776        NA
2004-05-04 -0.02620        NA
2004-05-05  0.00337        NA
2004-05-06 -0.00977        NA
2004-05-07  0.00420        NA
2004-05-10 -0.01700        NA

笔记

我们假设这个输入列表以可重现的形式显示。

L <- list(A.N = structure(list(TIMESTAMP = structure(c(1083038400, 
1083124800, 1083211200, 1083297600, 1083556800, 1083643200, 1083729600, 
1083816000, 1083902400, 1084161600), class = c("POSIXct", "POSIXt"
), tzone = ""), OPEN = c(19.6, 19.3, 19, 18.8, 18.2, 18.5, 18, 
17.9, 17.7, 17.4), HIGH = c(19.9, 19.3, 19.1, 18.9, 18.6, 18.5, 
18.3, 18, 18, 17.5), LOW = c(19.3, 19, 18.4, 18.1, 18.1, 17.5, 
17.9, 17.7, 17.6, 16.9), CLOSE = c(19.4, 19.1, 18.7, 18.2, 18.4, 
18, 18.1, 17.7, 17.7, 17.1), daily_return = c(0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L), intraday_return = c(-0.00997, -0.0105, 
-0.0124, -0.0302, 0.00776, -0.0262, 0.00337, -0.00977, 0.0042, 
-0.017), RIC = c("A.N", "A.N", "A.N", "A.N", "A.N", "A.N", "A.N", 
"A.N", "A.N", "A.N")), row.names = c(NA, -10L), class = "data.frame"), 
    ABI.N = structure(list(TIMESTAMP = structure(c(1036386000, 
    1036472400, 1036558800, 1036645200, 1036731600, 1036990800, 
    1037077200, 1037163600, 1037250000, 1037336400), class = c("POSIXct", 
    "POSIXt"), tzone = ""), OPEN = c(14, 13.4, 13.7, 14, 13.9, 
    13.4, 13.8, 13.8, 13.7, 13.6), HIGH = c(14.3, 13.4, 14, 14.4, 
    13.9, 14, 14.3, 13.9, 13.9, 13.7), LOW = c(13.2, 12.9, 13.5, 
    13.7, 13.3, 13.4, 13.7, 13.5, 13.3, 13.4), CLOSE = c(13.3, 
    13.2, 14, 13.7, 13.4, 13.9, 14.1, 13.7, 13.4, 13.6), daily_return = c(0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), intraday_return = c(-0.0473, 
    -0.0158, 0.0256, -0.0192, -0.0311, 0.0393, 0.0181, -0.0095, 
    -0.0228, -0.000459), RIC = c("ABI.BR", "ABI.BR", "ABI.BR", 
    "ABI.BR", "ABI.BR", "ABI.BR", "ABI.BR", "ABI.BR", "ABI.BR", 
    "ABI.BR")), row.names = c(NA, -10L), class = "data.frame"))

推荐阅读