首页 > 解决方案 > 如何根据 r 中的 2 个日期列打开新行

问题描述

我有以下大型数据集的示例数据集-

     isin       directorid dob_Year2 ROLE_START ROLE_END gender datestartrole dateendrole
 US6819771048     340769      1970       1995     2003      M    1995-02-01  2003-01-09
 US6819771048     340769      1970       2003     2004      M    2003-01-09  2004-02-24
 US6819771048     340769      1970       2004     2007      M    2004-02-24  2007-09-07
 US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30
 US68243Q1067      86917      1951       1976     1986      M    1976-04-01  1986-01-01
 US68243Q1067     327069      1961       2016     2020      M    2016-06-30  2020-05-21

我的问题是 - 我想根据变量ROLE_STARTROLE_END. 假设数据按和分组,要创建的行数取决于MINIMUM ROLE_STARTMAXIMUM 。例如 - 对于US6819771048 和340769,最小年份是 1995,最大年份是 2007。所以我需要为 1995-2007 的每一年打开行,这些年份应该存储在变量中。请注意,在上面的示例中,1995-2007 之间没有中断,因为从ROLE_ENDisindirectoridisindirectorid ROLE_START ROLE_ENDYEARROLE_STARTROLE_END包括所有年份。如果任何年份之间有任何中断,则应排除这些中断。对于上面的示例数据集,我预期的数据集应该是这样的 -

     isin       directorid dob_Year2 ROLE_START ROLE_END gender datestartrole dateendrole YEAR
US6819771048     340769      1970       1995     2003      M    1995-02-01  2003-01-09   1995
US6819771048     340769      1970       1995     2003      M    1995-02-01  2003-01-09   1996
US6819771048     340769      1970       1995     2003      M    1995-02-01  2003-01-09   1997
US6819771048     340769      1970       1995     2003      M    1995-02-01  2003-01-09   1998
US6819771048     340769      1970       1995     2003      M    1995-02-01  2003-01-09   1999
US6819771048     340769      1970       1995     2003      M    1995-02-01  2003-01-09   2000
US6819771048     340769      1970       1995     2003      M    1995-02-01  2003-01-09   2001
US6819771048     340769      1970       1995     2003      M    1995-02-01  2003-01-09   2002
US6819771048     340769      1970       1995     2003      M    1995-02-01  2003-01-09   2003
US6819771048     340769      1970       2003     2004      M    2003-01-09  2004-02-24   2003
US6819771048     340769      1970       2003     2004      M    2003-01-09  2004-02-24   2004
US6819771048     340769      1970       2004     2007      M    2004-02-24  2007-09-07   2004
US6819771048     340769      1970       2004     2007      M    2004-02-24  2007-09-07   2005
US6819771048     340769      1970       2004     2007      M    2004-02-24  2007-09-07   2006
US6819771048     340769      1970       2004     2007      M    2004-02-24  2007-09-07   2007
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   1986
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   1987
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   1988
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   1989
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   1990
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   1991
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   1992
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   1993
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   1994
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   1995
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   1996
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   1997
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   1998
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   1999
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   2000
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   2001
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   2002
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   2003
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   2004
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   2005
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   2006
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   2007
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   2008
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   2009
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   2010
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   2011
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   2012
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   2013
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   2014
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   2015
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   2016
US68243Q1067      86917      1951       1976     1986      M    1976-04-01  1986-01-01   1976
US68243Q1067      86917      1951       1976     1986      M    1976-04-01  1986-01-01   1977
US68243Q1067      86917      1951       1976     1986      M    1976-04-01  1986-01-01   1978
US68243Q1067      86917      1951       1976     1986      M    1976-04-01  1986-01-01   1979
US68243Q1067      86917      1951       1976     1986      M    1976-04-01  1986-01-01   1980
US68243Q1067      86917      1951       1976     1986      M    1976-04-01  1986-01-01   1981
US68243Q1067      86917      1951       1976     1986      M    1976-04-01  1986-01-01   1982
US68243Q1067      86917      1951       1976     1986      M    1976-04-01  1986-01-01   1983
US68243Q1067      86917      1951       1976     1986      M    1976-04-01  1986-01-01   1984
US68243Q1067      86917      1951       1976     1986      M    1976-04-01  1986-01-01   1985
US68243Q1067      86917      1951       1976     1986      M    1976-04-01  1986-01-01   1986
US68243Q1067     327069      1961       2016     2020      M    2016-06-30  2020-05-21   2016
US68243Q1067     327069      1961       2016     2020      M    2016-06-30  2020-05-21   2017
US68243Q1067     327069      1961       2016     2020      M    2016-06-30  2020-05-21   2018
US68243Q1067     327069      1961       2016     2020      M    2016-06-30  2020-05-21   2019
US68243Q1067     327069      1961       2016     2020      M    2016-06-30  2020-05-21   2020

标签: rdplyrdata.tabletidyverselubridate

解决方案


您可以在和之间创建一个序列,ROLE_STARTROLE_END在不同的行中获取数据。

library(dplyr)
df %>%
  mutate(YEAR = purrr::map2(ROLE_START, ROLE_END, seq)) %>%
  tidyr::unnest(YEAR)


#   isin         directorid dob_Year2 ROLE_START ROLE_END gender datestartrole dateendrole  YEAR
#   <chr>             <int>     <int>      <int>    <int> <chr>  <chr>         <chr>       <int>
# 1 US6819771048     340769      1970       1995     2003 M      1995-02-01    2003-01-09   1995
# 2 US6819771048     340769      1970       1995     2003 M      1995-02-01    2003-01-09   1996
# 3 US6819771048     340769      1970       1995     2003 M      1995-02-01    2003-01-09   1997
# 4 US6819771048     340769      1970       1995     2003 M      1995-02-01    2003-01-09   1998
# 5 US6819771048     340769      1970       1995     2003 M      1995-02-01    2003-01-09   1999
# 6 US6819771048     340769      1970       1995     2003 M      1995-02-01    2003-01-09   2000
# 7 US6819771048     340769      1970       1995     2003 M      1995-02-01    2003-01-09   2001
# 8 US6819771048     340769      1970       1995     2003 M      1995-02-01    2003-01-09   2002
# 9 US6819771048     340769      1970       1995     2003 M      1995-02-01    2003-01-09   2003
#10 US6819771048     340769      1970       2003     2004 M      2003-01-09    2004-02-24   2003
# … with 52 more rows

推荐阅读