r - 如何围绕每个符号包含多个事件的变量生成一个窗口?
问题描述
我有一个问题..我的面板数据集看起来像没有变量“Window”的数据。现在我正在尝试创建变量“Window”,如下所示:
Symbol Date Close Time Event Window
AAPL 09/03/2020 66,542503 16:25:00 NA NA
AAPL 09/03/2020 71,334999 16:26:00 NA -4
AAPL 09/03/2020 68,857498 16:27:00 NA -3
AAPL 09/03/2020 62,057499 16:28:00 NA -2
AAPL 09/03/2020 69,4925 16:29:00 NA -1
AAPL 09/03/2020 60,552502 16:30:00 1 0
AAPL 09/03/2020 63,215 16:31:00 NA 1
AAPL 10/03/2020 61,6675 09:30:00 NA 2
AAPL 10/03/2020 61,195 09:31:00 NA 3
AAPL 10/03/2020 57,310001 09:32:00 NA 4
AAPL 10/03/2020 56,092499 09:33:00 NA NA
AAPL 15/03/2020 65,535603 15:45:00 NA NA
AAPL 15/03/2020 66,357545 15:46:00 NA NA
AAPL 15/03/2020 62,852345 15:47:00 NA -4
AAPL 15/03/2020 64,057325 15:48:00 NA -3
AAPL 16/03/2020 66,494545 09:30:00 NA -2
AAPL 16/03/2020 63,557967 09:31:00 1 -1
AAPL 16/03/2020 64,415454 09:32:00 NA 0
AAPL 16/03/2020 62,2357 09:33:00 NA 1
AAPL 16/03/2020 64,4576 09:34:00 NA 2
AAPL 16/03/2020 59,457579 09:35:00 NA 3
AAPL 16/03/2020 58,092470 09:36:00 NA 4
VISA 05/03/2020 186,960007 16:26:00 NA NA
VISA 05/03/2020 184,360001 16:27:00 NA -4
VISA 05/03/2020 171,130005 16:28:00 NA -3
VISA 05/03/2020 182,600006 16:29:00 NA -2
VISA 05/03/2020 172,949997 16:30:00 NA -1
VISA 06/03/2020 160,080002 09:32:00 1 0
VISA 06/03/2020 175,830002 09:33:00 NA 1
VISA 06/03/2020 152,009995 09:34:00 NA 2
VISA 06/03/2020 157,889999 09:35:00 NA 3
VISA 06/03/2020 148,479996 09:36:00 NA 4
VISA 06/03/2020 152,25 09:37:00 NA NA
VISA 06/03/2020 146,830002 09:38:00 NA NA
VISA 20/03/2020 192,203826 16:12:00 NA NA
VISA 20/03/2020 193,293752 16:13:00 NA -4
VISA 20/03/2020 192,204726 16:14:00 NA -3
VISA 20/03/2020 192,2396 16:15:00 NA -2
VISA 20/03/2020 194,185620 16:16:00 NA -1
VISA 20/03/2020 196,614289 16:17:00 1 0
VISA 20/03/2020 197,826200 16:18:00 NA 1
VISA 21/03/2020 197,49176 09:29:00 NA 2
VISA 21/03/2020 197,239230 09:30:00 NA 3
VISA 21/03/2020 198,2300 09:31:00 NA 4
VISA 21/03/2020 198,230028 09:32:00 NA NA
VISA 21/03/2020 197,247020 09:33:00 NA NA
我已经尝试过我发现的以下代码:
EventStudy <- EventStudy %>% group_by(Symbol) %>% mutate(Window = row_number() - match(1, Event), Window = ifelse(abs(Window) > 4, NA, Window)) %>% ungroup
不幸的是,它每个符号只给我一个窗口,但我的代码中每个符号有多个事件。例如,对于符号“AAPL”,我有两个事件。
我也尝试使用没有 group_by 的代码,但它也不能按预期工作。我也没有适合数据集的分组,之后我每组只有一个事件。
有没有办法为一个符号的多个事件修改代码?你能帮我创建变量“Window”吗?
提前谢谢了!
解决方案
您可以借助辅助函数来实现此目的,以获得最接近的Event = 1
值索引。
library(dplyr)
closest_index <- function(x, y) {
y <- which(y == 1)
y[sapply(x, function(i) which(abs(y - i) == min(abs(y - i)))[1])]
}
EventStudy %>%
group_by(Symbol) %>%
mutate(close_index = closest_index(row_number(), Event),
Window = row_number() - close_index,
Window = ifelse(abs(Window) > 4, NA, Window)) %>%
ungroup %>%
select(-close_index)
这返回 -
# Symbol Date Close Time Event Window
#1 AAPL 09/03/2020 66,542503 16:25:00 NA NA
#2 AAPL 09/03/2020 71,334999 16:26:00 NA -4
#3 AAPL 09/03/2020 68,857498 16:27:00 NA -3
#4 AAPL 09/03/2020 62,057499 16:28:00 NA -2
#5 AAPL 09/03/2020 69,4925 16:29:00 NA -1
#6 AAPL 09/03/2020 60,552502 16:30:00 1 0
#7 AAPL 09/03/2020 63,215 16:31:00 NA 1
#8 AAPL 10/03/2020 61,6675 09:30:00 NA 2
#9 AAPL 10/03/2020 61,195 09:31:00 NA 3
#10 AAPL 10/03/2020 57,310001 09:32:00 NA 4
#11 AAPL 10/03/2020 56,092499 09:33:00 NA NA
#12 AAPL 15/03/2020 65,535603 15:45:00 NA NA
#13 AAPL 15/03/2020 66,357545 15:46:00 NA -4
#14 AAPL 15/03/2020 62,852345 15:47:00 NA -3
#15 AAPL 15/03/2020 64,057325 15:48:00 NA -2
#16 AAPL 16/03/2020 66,494545 09:30:00 NA -1
#17 AAPL 16/03/2020 63,557967 09:31:00 1 0
#18 AAPL 16/03/2020 64,415454 09:32:00 NA 1
#19 AAPL 16/03/2020 62,2357 09:33:00 NA 2
#20 AAPL 16/03/2020 64,4576 09:34:00 NA 3
#21 AAPL 16/03/2020 59,457579 09:35:00 NA 4
#22 AAPL 16/03/2020 58,092470 09:36:00 NA NA
#23 VISA 05/03/2020 186,960007 16:26:00 NA NA
#24 VISA 05/03/2020 184,360001 16:27:00 NA -4
#25 VISA 05/03/2020 171,130005 16:28:00 NA -3
#26 VISA 05/03/2020 182,600006 16:29:00 NA -2
#27 VISA 05/03/2020 172,949997 16:30:00 NA -1
#28 VISA 06/03/2020 160,080002 09:32:00 1 0
#29 VISA 06/03/2020 175,830002 09:33:00 NA 1
#30 VISA 06/03/2020 152,009995 09:34:00 NA 2
#31 VISA 06/03/2020 157,889999 09:35:00 NA 3
#32 VISA 06/03/2020 148,479996 09:36:00 NA 4
#33 VISA 06/03/2020 152,25 09:37:00 NA NA
#34 VISA 06/03/2020 146,830002 09:38:00 NA NA
#35 VISA 20/03/2020 192,203826 16:12:00 NA NA
#36 VISA 20/03/2020 193,293752 16:13:00 NA -4
#37 VISA 20/03/2020 192,204726 16:14:00 NA -3
#38 VISA 20/03/2020 192,2396 16:15:00 NA -2
#39 VISA 20/03/2020 194,185620 16:16:00 NA -1
#40 VISA 20/03/2020 196,614289 16:17:00 1 0
#41 VISA 20/03/2020 197,826200 16:18:00 NA 1
#42 VISA 21/03/2020 197,49176 09:29:00 NA 2
#43 VISA 21/03/2020 197,239230 09:30:00 NA 3
#44 VISA 21/03/2020 198,2300 09:31:00 NA 4
#45 VISA 21/03/2020 198,230028 09:32:00 NA NA
#46 VISA 21/03/2020 197,247020 09:33:00 NA NA
推荐阅读
- security - 为什么我面临移动网络的 ssl 问题?
- python - 将 fbprophet 作为 AWS Lambda 层导入
- mysql - 使用springboot从Mysql中获取数百万条数据
- sql - 如何将带有数据和边框的sql粘贴复制到记事本?
- c# - 如何避免在循环期间冻结 GUI?
- html - 如何动态更改 reactjs 中的活动选项卡颜色
- c# - 如何将列表对象添加到字典
- c++ - c++ DLL 运行时错误“abc.dll 不是有效的 WIN32 应用程序”。请帮助解决这个问题
- nginx - Nginx NGX_HTTP_UPSTREAM_FT_* 宏/常量中的“FT”是什么意思
- magento-1.9 - 具有相同 id 的项目已存在