首页 > 解决方案 > 如何将报价和交易数据与 data.table 匹配(滚动连接)

问题描述

我拼命地尝试重现Pandas滚动连接的经典示例,其中quotes数据与trade数据合并。

见这里https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.merge_asof.html

这是data.table格式的数据:

trades <- data.table(time = c('2016-05-25 13:30:00.023',
                              '2016-05-25 13:30:00.038',
                              '2016-05-25 13:30:00.048',
                              '2016-05-25 13:30:00.048',
                              '2016-05-25 13:30:00.048'),
                     ticker = c('MSFT','MSFT','GOOG','GOOG','AAPL'),
                     price = c(51.95,51.95,720.77,720.92,98.0),
                     quantity = c(75,155,100,100,100))
> trades
                      time ticker  price quantity
1: 2016-05-25 13:30:00.023   MSFT  51.95       75
2: 2016-05-25 13:30:00.038   MSFT  51.95      155
3: 2016-05-25 13:30:00.048   GOOG 720.77      100
4: 2016-05-25 13:30:00.048   GOOG 720.92      100
5: 2016-05-25 13:30:00.048   AAPL  98.00      100

和报价

quotes <- data.table(time = c('2016-05-25 13:30:00.023',
                              '2016-05-25 13:30:00.023',
                              '2016-05-25 13:30:00.030',
                              '2016-05-25 13:30:00.041',
                              '2016-05-25 13:30:00.048',
                              '2016-05-25 13:30:00.049',
                              '2016-05-25 13:30:00.072',
                              '2016-05-25 13:30:00.075'),
                     ticker = c('GOOG','MSFT','MSFT','MSFT','GOOG','AAPL','GOOG','MSFT'),
                     bid = c(720.50, 51.95, 51.97, 51.99, 720.5,97.99,720.5,52.01),
                     ask = c(270.93,51.96,51.98,52.00,720.93,98.01,720.88,52.03))
> quotes
                      time ticker    bid    ask
1: 2016-05-25 13:30:00.023   GOOG 720.50 270.93
2: 2016-05-25 13:30:00.023   MSFT  51.95  51.96
3: 2016-05-25 13:30:00.030   MSFT  51.97  51.98
4: 2016-05-25 13:30:00.041   MSFT  51.99  52.00
5: 2016-05-25 13:30:00.048   GOOG 720.50 720.93
6: 2016-05-25 13:30:00.049   AAPL  97.99  98.01
7: 2016-05-25 13:30:00.072   GOOG 720.50 720.88
8: 2016-05-25 13:30:00.075   MSFT  52.01  52.03

我想做的是通过以下方式将交易数据与报价数据合并

  1. 对于每笔交易,尽可能匹配最接近的先前报价
  2. 匹配的报价必须在 10 毫秒内
  3. 应该发生完全匹配。

输出(与 Pandas 教程中的相同)应该是

                      time ticker  price quantity   bid   ask
1: 2016-05-25 13:30:00.023   MSFT  51.95       75    NA    NA
2: 2016-05-25 13:30:00.038   MSFT  51.95      155 51.97 51.98
3: 2016-05-25 13:30:00.048   GOOG 720.77      100    NA    NA
4: 2016-05-25 13:30:00.048   GOOG 720.92      100    NA    NA
5: 2016-05-25 13:30:00.048   AAPL  98.00      100    NA    NA

实际上,您可以看到唯一可能的报价匹配是在 的第二笔交易2016-05-25 13:30:00.038,因为关闭(上一个)报价发生在2016-05-25 13:30:00.03010 毫秒内(而不是完全匹配)。

尽管我进行了试验,但我无法在data.table. 有任何想法吗?谢谢!

标签: rdata.table

解决方案


您还可以将此习惯用法与滚动连接结合使用,这与@sindri_baldur 建议的类似但不完全相等:

library(lubridate)
library(data.table)

quotes[, time := as.POSIXct(time, format="%Y-%m-%d %H:%M:%OS", tz = "GMT")]
trades[, time := as.POSIXct(time, format="%Y-%m-%d %H:%M:%OS", tz = "GMT")]

match_inexact <- function(q_time, t_time, bid, ask) {
  exact <- q_time == t_time # exact matches get NA
  bid[exact] <- NA_real_
  ask[exact] <- NA_real_
  list(bid, ask)
}

trades[, c("bid", "ask") := quotes[.SD,
                                   match_inexact(x.time, i.time, x.bid, x.ask),
                                   on = .(ticker, time),
                                   roll = lubridate::dmilliseconds(10L)]]

需要注意的重要一点: time是为连接指定的最后一列,因为这是data.table尝试滚动值的列。


推荐阅读