首页 > 解决方案 > 将纵向数据集重新排列为生命表

问题描述

我有一张关于一些人的居住和职业的表格。我想知道从事某些职业的人是否比其他人更有可能搬迁。纵向数据如下所示:

library(tidyverse)    
id <- c(rep(1, 6), rep(2, 6), rep(3, 6))
year <- c(rep(1990:1995, 3))
occupation <- c(rep("Barrister", 6), rep("Telephone salesman", 3), rep("Baker", 3), rep("Janitor", 2), rep("Builder", 4))
residence <- c(rep("London", 2), rep("Manchester", 2), rep("Glasgow", 2), rep("London", 6), rep("Liverpool", 4), rep ("Luton", 2))

df <- tibble(id, year, occupation, residence)

我想重新排列表格,使其采用生命表格式。此外,我想创建两个新变量:一个虚拟变量,用于表示个人是在 x 年后搬迁(= 事件发生)还是个人在 x 年后没有搬迁(= 事件被右删失),以及如果个人改变职业,则一个变量包含有关先前所从事职业的信息。我希望表格看起来像这样:

id2 <- c(rep(1, 3), rep(2, 2), rep(3, 3))         
years <- c(2, 2, 2, 3, 3, 2, 2, 2)
occupation2 <- c(rep("Barrister", 3), rep("Telephone salesman", 1), rep("Baker", 1), rep("Janitor", 1), rep("Builder", 2))
residence2 <- c(rep("London", 1), rep("Manchester", 1), rep("Glasgow", 1), rep("London", 2), rep("Liverpool", 2), rep ("Luton", 1))
relocated <- c(1,1,0,0,0,0,1,0)
experience <- c(rep(NA, 3), rep(NA, 1), rep("Telephone salesman", 1), rep(NA, 1), rep("Janitor", 2))

life.table <- tibble(id2, years, occupation2, residence2, relocated, experience)

我完全不确定如何实现这一目标,任何建议将不胜感激!

标签: rtidyversedata-wrangling

解决方案


可能,这有帮助

library(dplyr)
n <- 2
df %>%
    group_by(id) %>%
    mutate(n1 = cumsum(c(1, diff(year))), n2 = n(), n3 = n2 - n1, 
         n4 = n_distinct(residence)) %>% 
    group_by(occupation = factor(occupation, levels = unique(occupation)), 
       residence = factor(residence, levels = unique(residence)), .add = TRUE) %>%
    summarise(years = n(), relocated = +(any(n3 > n) & first(n4) > 1)) %>%
    group_by(id) %>% 
    mutate(experience = if(n_distinct(occupation) > 1)
     c(NA_character_, rep(as.character(first(occupation)), n() - 1))
     else NA_character_)

推荐阅读