首页 > 解决方案 > 抽出时间重新加入

问题描述

我有一个结构如下的表,其中包含有关俱乐部会员资格的信息

id  start       end          cancelled
1   2020-01-01  2020-12-31   2021-01-10
1   2021-02-01  2022-01-31   NA
2   2020-01-01  2020-12-31   NA
3   2020-01-01  2020-06-30   2020-07-01
3   2020-07-10  2021-01-09   2021-01-31
3   2021-02-02  2021-08-01   NA

我对之后重新加入的成员感兴趣。对于上述数据,在整个期间,id 1做了一次,id 2没有做,id 3做了两次。

F对于每次发生这种情况,他们的会员资格结束和他们重新加入之间的时间是多少?对于上述数据,这将是:

id  rejoin_date time_to_rejoin
1   2021-01-31  22 days 
3   2020-07-10  9 days
3   2021-02-02  2 days

对于 R:

structure(list(id = c(1, 1, 2, 3, 3, 3), start = c("2020-01-01", 
"2021-02-01", "2020-01-01", "2020-01-01", "2020-07-10", "2021-02-02"
), end = c("2020-12-31", "2022-01-31", "2020-12-31", "2020-06-30", 
"2021-01-09", "2021-08-01"), cancelled = c("2021-01-10", NA, 
NA, "2020-07-01", "2021-01-31", NA)), class = "data.frame", row.names = c(NA, 
-6L))

对于 SQL:

CREATE TABLE IF NOT EXISTS `dt` (
  `id` int(6) unsigned NOT NULL,
  `start` TIMESTAMP,
  `end` TIMESTAMP,
  `cancelled` TIMESTAMP NULL
) DEFAULT CHARSET=utf8;
INSERT INTO `dt` (`id`, `start`, `end`, `cancelled`) VALUES
  ('1', '2020-01-01', '2020-12-31', '2021-01-10'),
  ('2', '2021-02-01', '2022-01-31', NULL ),
  ('2', '2021-01-01', '2020-12-31', NULL ),
  ('3', '2020-01-01', '2020-06-30', '2020-07-01'),
  ('3', '2021-02-10', '2021-01-09', '2021-01-31'),
  ('3', '2021-02-02', '2021-08-01', NULL )

我会对使用 R(理想情况下使用 dplyr)或 SQL(MySQL)的答案感到满意

标签: mysqlsqlrdplyr

解决方案


您可以使用lag()然后过滤:

select t.*,
       datediff(start, prev_cancelled) as num_days_since_cancel
from (select t.*,
             lag(cancelled) over (partition by id order by start) as prev_cancelled
      from t
     ) t
where prev_cancelled is not null;

是一个 db<>fiddle。


推荐阅读