首页 > 解决方案 > 如何根据R中的多个值找到确切的变化年份

问题描述

我有一个包含多个主题(公司)、年份、个人姓名和性别(女性、男性)的数据框。我想获得个人姓名更改的年份(如果有更改)。此外,如果在特定年份发生变化,我想创建两个二元变量:“FemaletoMale”(和 MaletoFemale),表示从女性到男性(或从男性到女性)发生的变化。


所以,如果我有一张像

companyid year   personalname gender 
 1         1990  Alison       Female
 1         1991  Alison       Female
 1         1992  Kate         Female
 1         1993  Kate         Female
 2         1990  George       Male
 2         1991  Kate         Female
 2         1992  Kate         Female
 3         1990  Michael      Male
 3         1991  Dwight       Male

我知道这个问题可以帮助我计算变化的数量: 如何判断一个值是否在 R 中的维度上发生了变化

df<- df %>% group_by(companyid) %>% summarise(ChangeYear = sprintf("%s to %s", min(year), max(year)), change.count = length(unique(personalname)) - 1)这给了我更改的数量。我想看到的是;

companyid  change.count  changeyear  FemaletoMale MaletoFemale
 1               1             1992         0            0          
 2               1             1991         0            1
 3               1             1991         0            0

标签: rdataframe

解决方案


使用dplyr你可以做:

library(dplyr)

df %>%
  group_by(companyid) %>%
  summarise(change.count = n_distinct(personalname) - 1, 
            changeyear = year[personalname != lag(personalname, default = first(personalname))], 
            FemaletoMale = sum(gender == 'Male' & lag(gender) == 'Female', na.rm = TRUE),
            MaletoFemale = sum(gender == 'Female' & lag(gender) == 'Male', na.rm = TRUE))


#  companyid change.count changeyear FemaletoMale MaletoFemale
#      <int>        <dbl>      <int>        <int>        <int>
#1         1            1       1992            0            0
#2         2            1       1991            0            1
#3         3            1       1991            0            0

推荐阅读