首页 > 解决方案 > 使用结束日期和开始日期逐年展开行

问题描述

我正在使用一个包含不同高管职业的数据集。每个高管都有一个独特的标识DirectorID,并且CompanyID对于他/她所担任的不同职位都有一个独特的标识。我想使用开始和结束日期逐年扩展行。所以第一个观察有 startdate 2009-01-01 和 enddate 2020-11-01,我希望第一行的日期范围从 2009-01-01 到 2009-12-31,第二行 2010-01- 01 至 2010 年 12 月 31 日。等和最后一行 2020-01-01 到 2020-11-01。其次,对于几个观察,日期是未知的,用 NA 表示,如果一行缺少日期,则应该省略,但我希望这些观察出现在最终输出中。

 DirectorID CompanyID DateStartRole DateEndRole
        <dbl>     <dbl> <date>        <date>     
 1    2363928   3262324 2009-01-01    2020-11-01 
 2    2374506   2301528 2008-01-01    2009-01-01 
 3     777766     44584 NA            NA         
 4     892379     32342 NA            2018-01-01 
 5     328171    583820 NA            NA         
 6     599185     54210 1988-01-01    2000-01-01 
 7    1973398      4142 2018-04-01    2020-11-01 
 8    1973398    550686 NA            2010-01-01 
 9    1041558    829450 NA            NA         
10     542061     10961 NA            2009-09-28 
11    1262068    922842 2011-01-01    2018-08-01 
12     330769     21696 NA            2000-11-01 
13      32554     15457 1997-01-01    2001-01-01 
14    2305779   1378145 2007-01-01    2012-01-01 
15    2305779    829105 2020-05-07    2020-11-01 
16    1935158    573779 2011-07-01    2017-07-03 
17     836789    829244 2006-01-01    2007-01-01 
18     836789    591529 NA            NA         
19     490624      3443 2004-01-01    2007-01-01 
20     554430     30279 1975-05-15    2011-10-28

所需的输出应如下所示

 DirectorID CompanyID DateStartRole DateEndRole
        <dbl>     <dbl> <date>        <date>     
 1    2363928   3262324 2009-01-01    2009-12-31
 2    2363928   3262324 2010-01-01    2010-12-31
---
 12   2363928   3262324 2020-01-01    2020-11-01
 13   2374506   2301528 2008-01-01    2009-01-01

标签: r

解决方案


tidyverse这是使用and的一种方法lubridate

首先,删除缺少日期的行(它们将在最后添加回来)。然后,从开始到结束创建间隔,并将行划分为年份。然后,您可以将intersect这些间隔与日历年。

之后,您可以right_join使用原始数据替换缺失的行,并使用它coalesce来替换任何缺失的日期(如果有)。

library(tidyverse)
library(lubridate)

df %>%
  filter(!is.na(DateStartRole) & !is.na(DateEndRole)) %>%
  mutate(date_int = interval(DateStartRole, DateEndRole),
         year = map2(year(DateStartRole), year(DateEndRole), seq)) %>%
  unnest(year) %>%
  mutate(year_int = interval(as.Date(paste0(year, '-01-01')), as.Date(paste0(year, '-12-31'))),
         year_sect = intersect(date_int, year_int),
         start_new = as.Date(int_start(year_sect)),
         end_new = as.Date(int_end(year_sect))) %>%
  select(DirectorID, CompanyID, start_new, end_new) %>%
  right_join(df) %>%
  mutate(start_new = coalesce(start_new, DateStartRole),
         end_new = coalesce(end_new, DateEndRole))

输出

    DirectorID CompanyID start_new  end_new    DateStartRole DateEndRole
         <int>     <int> <date>     <date>     <date>        <date>     
  1    2363928   3262324 2009-01-01 2009-12-31 2009-01-01    2020-11-01 
  2    2363928   3262324 2010-01-01 2010-12-31 2009-01-01    2020-11-01 
  3    2363928   3262324 2011-01-01 2011-12-31 2009-01-01    2020-11-01 
  4    2363928   3262324 2012-01-01 2012-12-31 2009-01-01    2020-11-01 
  5    2363928   3262324 2013-01-01 2013-12-31 2009-01-01    2020-11-01 
  6    2363928   3262324 2014-01-01 2014-12-31 2009-01-01    2020-11-01 
  7    2363928   3262324 2015-01-01 2015-12-31 2009-01-01    2020-11-01 
  8    2363928   3262324 2016-01-01 2016-12-31 2009-01-01    2020-11-01 
  9    2363928   3262324 2017-01-01 2017-12-31 2009-01-01    2020-11-01 
 10    2363928   3262324 2018-01-01 2018-12-31 2009-01-01    2020-11-01 
 11    2363928   3262324 2019-01-01 2019-12-31 2009-01-01    2020-11-01 
 12    2363928   3262324 2020-01-01 2020-11-01 2009-01-01    2020-11-01 
 13    2374506   2301528 2008-01-01 2008-12-31 2008-01-01    2009-01-01 
 14    2374506   2301528 2009-01-01 2009-01-01 2008-01-01    2009-01-01 
 15     599185     54210 1988-01-01 1988-12-31 1988-01-01    2000-01-01 
 16     599185     54210 1989-01-01 1989-12-31 1988-01-01    2000-01-01 
 17     599185     54210 1990-01-01 1990-12-31 1988-01-01    2000-01-01 
 18     599185     54210 1991-01-01 1991-12-31 1988-01-01    2000-01-01 
 19     599185     54210 1992-01-01 1992-12-31 1988-01-01    2000-01-01 
 20     599185     54210 1993-01-01 1993-12-31 1988-01-01    2000-01-01 
 21     599185     54210 1994-01-01 1994-12-31 1988-01-01    2000-01-01 
 22     599185     54210 1995-01-01 1995-12-31 1988-01-01    2000-01-01 
 23     599185     54210 1996-01-01 1996-12-31 1988-01-01    2000-01-01 
 24     599185     54210 1997-01-01 1997-12-31 1988-01-01    2000-01-01 
 25     599185     54210 1998-01-01 1998-12-31 1988-01-01    2000-01-01 
 26     599185     54210 1999-01-01 1999-12-31 1988-01-01    2000-01-01 
 27     599185     54210 2000-01-01 2000-01-01 1988-01-01    2000-01-01 
 28    1973398      4142 2018-04-01 2018-12-31 2018-04-01    2020-11-01 
 29    1973398      4142 2019-01-01 2019-12-31 2018-04-01    2020-11-01 
 30    1973398      4142 2020-01-01 2020-11-01 2018-04-01    2020-11-01 
 31    1262068    922842 2011-01-01 2011-12-31 2011-01-01    2018-08-01 
 32    1262068    922842 2012-01-01 2012-12-31 2011-01-01    2018-08-01 
 33    1262068    922842 2013-01-01 2013-12-31 2011-01-01    2018-08-01 
 34    1262068    922842 2014-01-01 2014-12-31 2011-01-01    2018-08-01 
 35    1262068    922842 2015-01-01 2015-12-31 2011-01-01    2018-08-01 
 36    1262068    922842 2016-01-01 2016-12-31 2011-01-01    2018-08-01 
 37    1262068    922842 2017-01-01 2017-12-31 2011-01-01    2018-08-01 
 38    1262068    922842 2018-01-01 2018-08-01 2011-01-01    2018-08-01 
 39      32554     15457 1997-01-01 1997-12-31 1997-01-01    2001-01-01 
 40      32554     15457 1998-01-01 1998-12-31 1997-01-01    2001-01-01 
 41      32554     15457 1999-01-01 1999-12-31 1997-01-01    2001-01-01 
 42      32554     15457 2000-01-01 2000-12-31 1997-01-01    2001-01-01 
 43      32554     15457 2001-01-01 2001-01-01 1997-01-01    2001-01-01 
 44    2305779   1378145 2007-01-01 2007-12-31 2007-01-01    2012-01-01 
 45    2305779   1378145 2008-01-01 2008-12-31 2007-01-01    2012-01-01 
 46    2305779   1378145 2009-01-01 2009-12-31 2007-01-01    2012-01-01 
 47    2305779   1378145 2010-01-01 2010-12-31 2007-01-01    2012-01-01 
 48    2305779   1378145 2011-01-01 2011-12-31 2007-01-01    2012-01-01 
 49    2305779   1378145 2012-01-01 2012-01-01 2007-01-01    2012-01-01 
 50    2305779    829105 2020-05-07 2020-11-01 2020-05-07    2020-11-01 
 51    1935158    573779 2011-07-01 2011-12-31 2011-07-01    2017-07-03 
 52    1935158    573779 2012-01-01 2012-12-31 2011-07-01    2017-07-03 
 53    1935158    573779 2013-01-01 2013-12-31 2011-07-01    2017-07-03 
 54    1935158    573779 2014-01-01 2014-12-31 2011-07-01    2017-07-03 
 55    1935158    573779 2015-01-01 2015-12-31 2011-07-01    2017-07-03 
 56    1935158    573779 2016-01-01 2016-12-31 2011-07-01    2017-07-03 
 57    1935158    573779 2017-01-01 2017-07-03 2011-07-01    2017-07-03 
 58     836789    829244 2006-01-01 2006-12-31 2006-01-01    2007-01-01 
 59     836789    829244 2007-01-01 2007-01-01 2006-01-01    2007-01-01 
 60     490624      3443 2004-01-01 2004-12-31 2004-01-01    2007-01-01 
 61     490624      3443 2005-01-01 2005-12-31 2004-01-01    2007-01-01 
 62     490624      3443 2006-01-01 2006-12-31 2004-01-01    2007-01-01 
 63     490624      3443 2007-01-01 2007-01-01 2004-01-01    2007-01-01 
 64     554430     30279 1975-05-15 1975-12-31 1975-05-15    2011-10-28 
 65     554430     30279 1976-01-01 1976-12-31 1975-05-15    2011-10-28 
 66     554430     30279 1977-01-01 1977-12-31 1975-05-15    2011-10-28 
 67     554430     30279 1978-01-01 1978-12-31 1975-05-15    2011-10-28 
 68     554430     30279 1979-01-01 1979-12-31 1975-05-15    2011-10-28 
 69     554430     30279 1980-01-01 1980-12-31 1975-05-15    2011-10-28 
 70     554430     30279 1981-01-01 1981-12-31 1975-05-15    2011-10-28 
 71     554430     30279 1982-01-01 1982-12-31 1975-05-15    2011-10-28 
 72     554430     30279 1983-01-01 1983-12-31 1975-05-15    2011-10-28 
 73     554430     30279 1984-01-01 1984-12-31 1975-05-15    2011-10-28 
 74     554430     30279 1985-01-01 1985-12-31 1975-05-15    2011-10-28 
 75     554430     30279 1986-01-01 1986-12-31 1975-05-15    2011-10-28 
 76     554430     30279 1987-01-01 1987-12-31 1975-05-15    2011-10-28 
 77     554430     30279 1988-01-01 1988-12-31 1975-05-15    2011-10-28 
 78     554430     30279 1989-01-01 1989-12-31 1975-05-15    2011-10-28 
 79     554430     30279 1990-01-01 1990-12-31 1975-05-15    2011-10-28 
 80     554430     30279 1991-01-01 1991-12-31 1975-05-15    2011-10-28 
 81     554430     30279 1992-01-01 1992-12-31 1975-05-15    2011-10-28 
 82     554430     30279 1993-01-01 1993-12-31 1975-05-15    2011-10-28 
 83     554430     30279 1994-01-01 1994-12-31 1975-05-15    2011-10-28 
 84     554430     30279 1995-01-01 1995-12-31 1975-05-15    2011-10-28 
 85     554430     30279 1996-01-01 1996-12-31 1975-05-15    2011-10-28 
 86     554430     30279 1997-01-01 1997-12-31 1975-05-15    2011-10-28 
 87     554430     30279 1998-01-01 1998-12-31 1975-05-15    2011-10-28 
 88     554430     30279 1999-01-01 1999-12-31 1975-05-15    2011-10-28 
 89     554430     30279 2000-01-01 2000-12-31 1975-05-15    2011-10-28 
 90     554430     30279 2001-01-01 2001-12-31 1975-05-15    2011-10-28 
 91     554430     30279 2002-01-01 2002-12-31 1975-05-15    2011-10-28 
 92     554430     30279 2003-01-01 2003-12-31 1975-05-15    2011-10-28 
 93     554430     30279 2004-01-01 2004-12-31 1975-05-15    2011-10-28 
 94     554430     30279 2005-01-01 2005-12-31 1975-05-15    2011-10-28 
 95     554430     30279 2006-01-01 2006-12-31 1975-05-15    2011-10-28 
 96     554430     30279 2007-01-01 2007-12-31 1975-05-15    2011-10-28 
 97     554430     30279 2008-01-01 2008-12-31 1975-05-15    2011-10-28 
 98     554430     30279 2009-01-01 2009-12-31 1975-05-15    2011-10-28 
 99     554430     30279 2010-01-01 2010-12-31 1975-05-15    2011-10-28 
100     554430     30279 2011-01-01 2011-10-28 1975-05-15    2011-10-28 
101     777766     44584 NA         NA         NA            NA         
102     892379     32342 NA         2018-01-01 NA            2018-01-01 
103     328171    583820 NA         NA         NA            NA         
104    1973398    550686 NA         2010-01-01 NA            2010-01-01 
105    1041558    829450 NA         NA         NA            NA         
106     542061     10961 NA         2009-09-28 NA            2009-09-28 
107     330769     21696 NA         2000-11-01 NA            2000-11-01 
108     836789    591529 NA         NA         NA            NA         

推荐阅读