首页 > 解决方案 > 可以一个 tsibble obj. 有多行相同的日期和关联的行值?

问题描述

我有一个数据框,我将其转换为tsibble时间序列对象,以便更轻松地对数据进行时间序列图形和操作(滚动时间窗口分析)。我每天都会获得新数据,我想将其附加到表示为的原始数据框中df,新的传入数据表示为df2。我可以将这些data.frame'tsibble独立地更改为对象,但是当我使用rbind()先加入它们然后使用as_tsibble时,会出现错误。

as_tsibble(final_df, index = date, key = ticker)

Error: A valid tsibble must have distinct rows identified by key and index.
i Please use duplicates() to check the duplicated rows.

在这里设置问题是reprex的代码。

df <- data.frame(ticker = c("UST10Y", "UST2Y", "AAPL", "SPX", "BNO"),
             buy_price = c(62.00, 68.00, 37.00, 55.00, 41.00),
             sale_price = c(64.00, 71.00, 42.00, 60.00, 45.00),
             close_price = c(63.00, 70.00, 38.00, 56.00, 43.00),
             date = c(as.Date("April 29th, 2021", "April 29th, 2021", "April 29th, 2021", "April 29th, 2021", "April 29th, 2021")))

df2 <- data.frame(ticker = c("UST10Y", "UST2Y", "AAPL", "SPX", "BNO"),
                 buy_price = c(63.00, 69.00, 38.00, 53.00, 44.00),
                 sale_price = c(66.00, 77.00, 47.00, 63.00, 48.00),
                 close_price = c(65.00, 74.00, 39.00, 55.00, 45.00),
                 date = c(as.Date("April 30th, 2021", "April 30th, 2021", "April 30th, 2021", "April 30th, 2021", "April 30th, 2021")))

final_df <- rbind(df,df2)
str(final_df)
> 'data.frame': 10 obs. of  5 variables:

as_tsibble(final_df, index = date, key = ticker)

运行代码as_tsibble(final_df, index = date, key = ticker)后,顺序也更改为按字母顺序排列,而我想保留原始顺序(另一个问题)。

我无法用 来创建 tsibble final_df,尽管可以在和tsibble上单独创建a 。dfdf2

我是否遗漏了什么,或者不可能有一个tsibble具有多行相同股票名称的对象?

标签: rdataframetime-seriesrbindtsibble

解决方案


对于时间序列中的每个观察,一个tsibble 必须有一个唯一的时间点 (the index),其中每个时间序列由 标识key

您为 MRE 构建的数据集似乎具有这种质量,但迄今为止的转换并没有给您想要的结果。例如,您的索引变量df为:

as.Date("April 29th, 2021", "April 29th, 2021", "April 29th, 2021", "April 29th, 2021", "April 29th, 2021")
#> [1] "2021-05-06"

要正确解析“2021 年 4 月 29 日”,您可以使用{lubridate}包的mdy()功能:

lubridate::mdy("April 29th, 2021", "April 29th, 2021", "April 29th, 2021", "April 29th, 2021", "April 29th, 2021")
#> [1] "2021-04-29" "2021-04-29" "2021-04-29" "2021-04-29" "2021-04-29"

修复日期解析,问题得到解决,我们能够创建 tsibble。

library(tsibble)
library(lubridate)
df <- data.frame(ticker = c("UST10Y", "UST2Y", "AAPL", "SPX", "BNO"),
                 buy_price = c(62.00, 68.00, 37.00, 55.00, 41.00),
                 sale_price = c(64.00, 71.00, 42.00, 60.00, 45.00),
                 close_price = c(63.00, 70.00, 38.00, 56.00, 43.00),
                 date = mdy(c("April 29th, 2021", "April 29th, 2021", "April 29th, 2021", "April 29th, 2021", "April 29th, 2021")))

df2 <- data.frame(ticker = c("UST10Y", "UST2Y", "AAPL", "SPX", "BNO"),
                  buy_price = c(63.00, 69.00, 38.00, 53.00, 44.00),
                  sale_price = c(66.00, 77.00, 47.00, 63.00, 48.00),
                  close_price = c(65.00, 74.00, 39.00, 55.00, 45.00),
                  date = mdy(c("April 30th, 2021", "April 30th, 2021", "April 30th, 2021", "April 30th, 2021", "April 30th, 2021")))

final_df <- rbind(df,df2)
as_tsibble(final_df, index = date, key = ticker)
#> # A tsibble: 10 x 5 [1D]
#> # Key:       ticker [5]
#>    ticker buy_price sale_price close_price date      
#>    <chr>      <dbl>      <dbl>       <dbl> <date>    
#>  1 AAPL          37         42          38 2021-04-29
#>  2 AAPL          38         47          39 2021-04-30
#>  3 BNO           41         45          43 2021-04-29
#>  4 BNO           44         48          45 2021-04-30
#>  5 SPX           55         60          56 2021-04-29
#>  6 SPX           53         63          55 2021-04-30
#>  7 UST10Y        62         64          63 2021-04-29
#>  8 UST10Y        63         66          65 2021-04-30
#>  9 UST2Y         68         71          70 2021-04-29
#> 10 UST2Y         69         77          74 2021-04-30

reprex 包(v1.0.0)于 2021-05-06 创建


推荐阅读