r - 使用结束日期和开始日期逐年展开行
问题描述
我正在使用一个包含不同高管职业的数据集。每个高管都有一个独特的标识DirectorID
,并且CompanyID
对于他/她所担任的不同职位都有一个独特的标识。我想使用开始和结束日期逐年扩展行。所以第一个观察有 startdate 2009-01-01 和 enddate 2020-11-01,我希望第一行的日期范围从 2009-01-01 到 2009-12-31,第二行 2010-01- 01 至 2010 年 12 月 31 日。等和最后一行 2020-01-01 到 2020-11-01。其次,对于几个观察,日期是未知的,用 NA 表示,如果一行缺少日期,则应该省略,但我希望这些观察出现在最终输出中。
DirectorID CompanyID DateStartRole DateEndRole
<dbl> <dbl> <date> <date>
1 2363928 3262324 2009-01-01 2020-11-01
2 2374506 2301528 2008-01-01 2009-01-01
3 777766 44584 NA NA
4 892379 32342 NA 2018-01-01
5 328171 583820 NA NA
6 599185 54210 1988-01-01 2000-01-01
7 1973398 4142 2018-04-01 2020-11-01
8 1973398 550686 NA 2010-01-01
9 1041558 829450 NA NA
10 542061 10961 NA 2009-09-28
11 1262068 922842 2011-01-01 2018-08-01
12 330769 21696 NA 2000-11-01
13 32554 15457 1997-01-01 2001-01-01
14 2305779 1378145 2007-01-01 2012-01-01
15 2305779 829105 2020-05-07 2020-11-01
16 1935158 573779 2011-07-01 2017-07-03
17 836789 829244 2006-01-01 2007-01-01
18 836789 591529 NA NA
19 490624 3443 2004-01-01 2007-01-01
20 554430 30279 1975-05-15 2011-10-28
所需的输出应如下所示
DirectorID CompanyID DateStartRole DateEndRole
<dbl> <dbl> <date> <date>
1 2363928 3262324 2009-01-01 2009-12-31
2 2363928 3262324 2010-01-01 2010-12-31
---
12 2363928 3262324 2020-01-01 2020-11-01
13 2374506 2301528 2008-01-01 2009-01-01
解决方案
tidyverse
这是使用and的一种方法lubridate
。
首先,删除缺少日期的行(它们将在最后添加回来)。然后,从开始到结束创建间隔,并将行划分为年份。然后,您可以将intersect
这些间隔与日历年。
之后,您可以right_join
使用原始数据替换缺失的行,并使用它coalesce
来替换任何缺失的日期(如果有)。
library(tidyverse)
library(lubridate)
df %>%
filter(!is.na(DateStartRole) & !is.na(DateEndRole)) %>%
mutate(date_int = interval(DateStartRole, DateEndRole),
year = map2(year(DateStartRole), year(DateEndRole), seq)) %>%
unnest(year) %>%
mutate(year_int = interval(as.Date(paste0(year, '-01-01')), as.Date(paste0(year, '-12-31'))),
year_sect = intersect(date_int, year_int),
start_new = as.Date(int_start(year_sect)),
end_new = as.Date(int_end(year_sect))) %>%
select(DirectorID, CompanyID, start_new, end_new) %>%
right_join(df) %>%
mutate(start_new = coalesce(start_new, DateStartRole),
end_new = coalesce(end_new, DateEndRole))
输出
DirectorID CompanyID start_new end_new DateStartRole DateEndRole
<int> <int> <date> <date> <date> <date>
1 2363928 3262324 2009-01-01 2009-12-31 2009-01-01 2020-11-01
2 2363928 3262324 2010-01-01 2010-12-31 2009-01-01 2020-11-01
3 2363928 3262324 2011-01-01 2011-12-31 2009-01-01 2020-11-01
4 2363928 3262324 2012-01-01 2012-12-31 2009-01-01 2020-11-01
5 2363928 3262324 2013-01-01 2013-12-31 2009-01-01 2020-11-01
6 2363928 3262324 2014-01-01 2014-12-31 2009-01-01 2020-11-01
7 2363928 3262324 2015-01-01 2015-12-31 2009-01-01 2020-11-01
8 2363928 3262324 2016-01-01 2016-12-31 2009-01-01 2020-11-01
9 2363928 3262324 2017-01-01 2017-12-31 2009-01-01 2020-11-01
10 2363928 3262324 2018-01-01 2018-12-31 2009-01-01 2020-11-01
11 2363928 3262324 2019-01-01 2019-12-31 2009-01-01 2020-11-01
12 2363928 3262324 2020-01-01 2020-11-01 2009-01-01 2020-11-01
13 2374506 2301528 2008-01-01 2008-12-31 2008-01-01 2009-01-01
14 2374506 2301528 2009-01-01 2009-01-01 2008-01-01 2009-01-01
15 599185 54210 1988-01-01 1988-12-31 1988-01-01 2000-01-01
16 599185 54210 1989-01-01 1989-12-31 1988-01-01 2000-01-01
17 599185 54210 1990-01-01 1990-12-31 1988-01-01 2000-01-01
18 599185 54210 1991-01-01 1991-12-31 1988-01-01 2000-01-01
19 599185 54210 1992-01-01 1992-12-31 1988-01-01 2000-01-01
20 599185 54210 1993-01-01 1993-12-31 1988-01-01 2000-01-01
21 599185 54210 1994-01-01 1994-12-31 1988-01-01 2000-01-01
22 599185 54210 1995-01-01 1995-12-31 1988-01-01 2000-01-01
23 599185 54210 1996-01-01 1996-12-31 1988-01-01 2000-01-01
24 599185 54210 1997-01-01 1997-12-31 1988-01-01 2000-01-01
25 599185 54210 1998-01-01 1998-12-31 1988-01-01 2000-01-01
26 599185 54210 1999-01-01 1999-12-31 1988-01-01 2000-01-01
27 599185 54210 2000-01-01 2000-01-01 1988-01-01 2000-01-01
28 1973398 4142 2018-04-01 2018-12-31 2018-04-01 2020-11-01
29 1973398 4142 2019-01-01 2019-12-31 2018-04-01 2020-11-01
30 1973398 4142 2020-01-01 2020-11-01 2018-04-01 2020-11-01
31 1262068 922842 2011-01-01 2011-12-31 2011-01-01 2018-08-01
32 1262068 922842 2012-01-01 2012-12-31 2011-01-01 2018-08-01
33 1262068 922842 2013-01-01 2013-12-31 2011-01-01 2018-08-01
34 1262068 922842 2014-01-01 2014-12-31 2011-01-01 2018-08-01
35 1262068 922842 2015-01-01 2015-12-31 2011-01-01 2018-08-01
36 1262068 922842 2016-01-01 2016-12-31 2011-01-01 2018-08-01
37 1262068 922842 2017-01-01 2017-12-31 2011-01-01 2018-08-01
38 1262068 922842 2018-01-01 2018-08-01 2011-01-01 2018-08-01
39 32554 15457 1997-01-01 1997-12-31 1997-01-01 2001-01-01
40 32554 15457 1998-01-01 1998-12-31 1997-01-01 2001-01-01
41 32554 15457 1999-01-01 1999-12-31 1997-01-01 2001-01-01
42 32554 15457 2000-01-01 2000-12-31 1997-01-01 2001-01-01
43 32554 15457 2001-01-01 2001-01-01 1997-01-01 2001-01-01
44 2305779 1378145 2007-01-01 2007-12-31 2007-01-01 2012-01-01
45 2305779 1378145 2008-01-01 2008-12-31 2007-01-01 2012-01-01
46 2305779 1378145 2009-01-01 2009-12-31 2007-01-01 2012-01-01
47 2305779 1378145 2010-01-01 2010-12-31 2007-01-01 2012-01-01
48 2305779 1378145 2011-01-01 2011-12-31 2007-01-01 2012-01-01
49 2305779 1378145 2012-01-01 2012-01-01 2007-01-01 2012-01-01
50 2305779 829105 2020-05-07 2020-11-01 2020-05-07 2020-11-01
51 1935158 573779 2011-07-01 2011-12-31 2011-07-01 2017-07-03
52 1935158 573779 2012-01-01 2012-12-31 2011-07-01 2017-07-03
53 1935158 573779 2013-01-01 2013-12-31 2011-07-01 2017-07-03
54 1935158 573779 2014-01-01 2014-12-31 2011-07-01 2017-07-03
55 1935158 573779 2015-01-01 2015-12-31 2011-07-01 2017-07-03
56 1935158 573779 2016-01-01 2016-12-31 2011-07-01 2017-07-03
57 1935158 573779 2017-01-01 2017-07-03 2011-07-01 2017-07-03
58 836789 829244 2006-01-01 2006-12-31 2006-01-01 2007-01-01
59 836789 829244 2007-01-01 2007-01-01 2006-01-01 2007-01-01
60 490624 3443 2004-01-01 2004-12-31 2004-01-01 2007-01-01
61 490624 3443 2005-01-01 2005-12-31 2004-01-01 2007-01-01
62 490624 3443 2006-01-01 2006-12-31 2004-01-01 2007-01-01
63 490624 3443 2007-01-01 2007-01-01 2004-01-01 2007-01-01
64 554430 30279 1975-05-15 1975-12-31 1975-05-15 2011-10-28
65 554430 30279 1976-01-01 1976-12-31 1975-05-15 2011-10-28
66 554430 30279 1977-01-01 1977-12-31 1975-05-15 2011-10-28
67 554430 30279 1978-01-01 1978-12-31 1975-05-15 2011-10-28
68 554430 30279 1979-01-01 1979-12-31 1975-05-15 2011-10-28
69 554430 30279 1980-01-01 1980-12-31 1975-05-15 2011-10-28
70 554430 30279 1981-01-01 1981-12-31 1975-05-15 2011-10-28
71 554430 30279 1982-01-01 1982-12-31 1975-05-15 2011-10-28
72 554430 30279 1983-01-01 1983-12-31 1975-05-15 2011-10-28
73 554430 30279 1984-01-01 1984-12-31 1975-05-15 2011-10-28
74 554430 30279 1985-01-01 1985-12-31 1975-05-15 2011-10-28
75 554430 30279 1986-01-01 1986-12-31 1975-05-15 2011-10-28
76 554430 30279 1987-01-01 1987-12-31 1975-05-15 2011-10-28
77 554430 30279 1988-01-01 1988-12-31 1975-05-15 2011-10-28
78 554430 30279 1989-01-01 1989-12-31 1975-05-15 2011-10-28
79 554430 30279 1990-01-01 1990-12-31 1975-05-15 2011-10-28
80 554430 30279 1991-01-01 1991-12-31 1975-05-15 2011-10-28
81 554430 30279 1992-01-01 1992-12-31 1975-05-15 2011-10-28
82 554430 30279 1993-01-01 1993-12-31 1975-05-15 2011-10-28
83 554430 30279 1994-01-01 1994-12-31 1975-05-15 2011-10-28
84 554430 30279 1995-01-01 1995-12-31 1975-05-15 2011-10-28
85 554430 30279 1996-01-01 1996-12-31 1975-05-15 2011-10-28
86 554430 30279 1997-01-01 1997-12-31 1975-05-15 2011-10-28
87 554430 30279 1998-01-01 1998-12-31 1975-05-15 2011-10-28
88 554430 30279 1999-01-01 1999-12-31 1975-05-15 2011-10-28
89 554430 30279 2000-01-01 2000-12-31 1975-05-15 2011-10-28
90 554430 30279 2001-01-01 2001-12-31 1975-05-15 2011-10-28
91 554430 30279 2002-01-01 2002-12-31 1975-05-15 2011-10-28
92 554430 30279 2003-01-01 2003-12-31 1975-05-15 2011-10-28
93 554430 30279 2004-01-01 2004-12-31 1975-05-15 2011-10-28
94 554430 30279 2005-01-01 2005-12-31 1975-05-15 2011-10-28
95 554430 30279 2006-01-01 2006-12-31 1975-05-15 2011-10-28
96 554430 30279 2007-01-01 2007-12-31 1975-05-15 2011-10-28
97 554430 30279 2008-01-01 2008-12-31 1975-05-15 2011-10-28
98 554430 30279 2009-01-01 2009-12-31 1975-05-15 2011-10-28
99 554430 30279 2010-01-01 2010-12-31 1975-05-15 2011-10-28
100 554430 30279 2011-01-01 2011-10-28 1975-05-15 2011-10-28
101 777766 44584 NA NA NA NA
102 892379 32342 NA 2018-01-01 NA 2018-01-01
103 328171 583820 NA NA NA NA
104 1973398 550686 NA 2010-01-01 NA 2010-01-01
105 1041558 829450 NA NA NA NA
106 542061 10961 NA 2009-09-28 NA 2009-09-28
107 330769 21696 NA 2000-11-01 NA 2000-11-01
108 836789 591529 NA NA NA NA
推荐阅读
- postgresql - 如何在 postgresql 中的 JSONB 列上添加 SQLAlchemy BTREE 索引
- python - 使用 Python OpenCV 测量已知圆的精确位置
- docker - 通过 TLS 连接到 OPENLDAP:找不到证书
- java - 在方法内更改应该在方法外的字符串
- linux - pci 驱动程序函数是否表示为 PCI 的 driver_data 对象中的函数号
- automation - 使用 QuerySerge 在 ALM 测试用例步骤级别附加屏幕截图
- reactjs - 我想在 React Native 中将 My Screen 修改为 Drawer
- security - 这些包含指向我的节点服务器的随机 URL 的 POST 请求是什么
- c# - 游戏有时无法识别触摸
- jquery - 展开并滚动到隐藏元素