首页 > 解决方案 > R:如何在特定假设下将年度数据更改为月度数据?

问题描述

我有一个庞大的公司年度账面市值比数据集。我需要通过以下逻辑将这些转换为月度数据:公司 i (stock_id) 从 6 月 y + 1 日到 5 月 y + 2 日的 BtoM 等于 y 年的 BtoM。我如何应用此逻辑并在庞大的数据集中捕获输出(我有 1968-2018 年的 n 个股票数据)?非常感谢您的每一次帮助!可重现的代码:

library(tidyverse)

Date <- as.Date(c('1994-12-01', '1995-12-01', '1996-12-01', '1994-12-01', '1995-12-01', '1996- 
12-01'))
stock_id <- c('80482', '80482', '80482', '80483', '80483', '80483')
BtoM <- as.numeric(c('0.0111', '0.0079', '0.0293', '0.671', '0.721', '0.219'))

Book_to_Market <- data.frame(Date, stock_id, BtoM)
Book_to_Market <- Book_to_Market %>% 
mutate(stock_id = as.integer(stock_id))

这使:

              Date stock_id   BtoM
      1 1994-12-01    80482 0.0111
      2 1995-12-01    80482 0.0079
      3 1996-12-01    80482 0.0293
      4 1994-12-01    80483 0.6710
      5 1995-12-01    80483 0.7210
      6 1996-12-01    80483 0.2190

我想要的输出如下所示:

          Date_2 stock_id_2 BtoM_2
   1  1995-06-01      80482 0.0111
   2  1995-07-01      80482 0.0111
   3  1995-08-01      80482 0.0111
   4  1995-09-01      80482 0.0111
   5  1995-10-01      80482 0.0111
   6  1995-11-01      80482 0.0111
   7  1995-12-01      80482 0.0111
   8  1996-01-01      80482 0.0111
   9  1996-02-01      80482 0.0111
   10 1996-03-01      80482 0.0111
   11 1996-04-01      80482 0.0111
   12 1996-05-01      80482 0.0111
   13 1995-06-01      80483 0.6710
   14 1995-07-01      80483 0.6710
   15 1995-08-01      80483 0.6710
   16 1995-09-01      80483 0.6710
   17 1995-10-01      80483 0.6710
   18 1995-11-01      80483 0.6710
   19 1995-12-01      80483 0.6710
   20 1996-01-01      80483 0.6710
   21 1996-02-01      80483 0.6710
   22 1996-03-01      80483 0.6710
   23 1996-04-01      80483 0.6710
   24 1996-05-01      80483 0.6710

标签: rdataframedatetidyr

解决方案


library(dplyr) 
library(lubridate)  # for `ymd()` and `month()`

# generate list of dates for each stock_id:
data.frame(Date = rep(seq.Date(as.Date("1995-06-01"),
                               as.Date("1996-05-01"), 
                               by = "month"), by = 2),
           stock_id = rep(c(80482, 80483), each = 12)) %>%

# link back to a Dec 1st with particular year, based on current month
mutate(rounded_date = ymd(
    paste(year(Date) - if_else(month(Date) >= 6, 1, 2), 1201))) %>%
  
# join to source data 
left_join(src %>% mutate(Date = ymd(Date)), 
            by = c("rounded_date" = "Date", "stock_id"))

结果

         Date stock_id rounded_date   BtoM
1  1995-06-01    80482   1994-12-01 0.0111
2  1995-07-01    80482   1994-12-01 0.0111
3  1995-08-01    80482   1994-12-01 0.0111
4  1995-09-01    80482   1994-12-01 0.0111
5  1995-10-01    80482   1994-12-01 0.0111
6  1995-11-01    80482   1994-12-01 0.0111
7  1995-12-01    80482   1994-12-01 0.0111
8  1996-01-01    80482   1994-12-01 0.0111
9  1996-02-01    80482   1994-12-01 0.0111
10 1996-03-01    80482   1994-12-01 0.0111
11 1996-04-01    80482   1994-12-01 0.0111
12 1996-05-01    80482   1994-12-01 0.0111
13 1995-06-01    80483   1994-12-01 0.6710
14 1995-07-01    80483   1994-12-01 0.6710
15 1995-08-01    80483   1994-12-01 0.6710
16 1995-09-01    80483   1994-12-01 0.6710
17 1995-10-01    80483   1994-12-01 0.6710
18 1995-11-01    80483   1994-12-01 0.6710
19 1995-12-01    80483   1994-12-01 0.6710
20 1996-01-01    80483   1994-12-01 0.6710
21 1996-02-01    80483   1994-12-01 0.6710
22 1996-03-01    80483   1994-12-01 0.6710
23 1996-04-01    80483   1994-12-01 0.6710
24 1996-05-01    80483   1994-12-01 0.6710

推荐阅读