首页 > 解决方案 > 如何仅选择 r 中数据框的每一年出现的个人

问题描述

我正在尝试对从 2014 年开始到 2019 年结束的整个研究期间存在的个人进行子集化。因此,输出将是数据框中每年都存在的名称列表。

我试过以下代码:

    big_data <- dplyr::bind_rows(df1, df2, df3, df4, df5, df6) # I've bound 6 different dataframes (each with data from one of the years) by row. These dfs have a different number of rows and columns. Some columns repeat in different years, while others don't.

    Date <- as.POSIXlt.Date(big_data$Date) 

    Year <- separate(big_data, Date, into = c('Month', 'Day', 'Year') %>% select(Year)) # I've extracted the Year from the Date variable (DD/MM/YYYY)

    Year <- big_data$Year # I've added it to the big_data

    Interval <- Year %between% c("2014", "2019") # I've created a timeperiod with the start and end years of the study

    big_data [, all.names(FocalID %in% Interval)] # I've tried to get the names of the individuals (in variable FocalID) that are present in the interval (but probably doesn't mean in every year)

显然这段代码不起作用。你能帮帮我吗?谢谢!

标签: rselecttime

解决方案


如果您的数据框的行带有idyear,例如:

big_data <- data.frame(
  id = c(1,1,1,1,1,1,1,2,2,2,2,3,3,3,3,3,3),
  year = c(2014:2019, 2014:2019, 2014:2018)
)

   id year
1   1 2014
2   1 2015
3   1 2016
4   1 2017
5   1 2018
6   1 2019
7   1 2014
8   2 2015
9   2 2016
10  2 2017
11  2 2018
12  3 2019
13  3 2014
14  3 2015
15  3 2016
16  3 2017
17  3 2018

您可以使用dplyr包 fromtidyversegroup_by单个主题id,然后检查以确保数据行包含 2014-2019 中的所有年份year。这将filter在给定的所有行中id- 如果代表所有年份。

library(dplyr)

big_data %>%
  group_by(id) %>%
  filter(all(2014:2019 %in% year))

基本 R 选项如下:

big_data[big_data$id %in% Reduce(intersect, split(big_data$id, big_data$year)), ]

在此示例中,id1 和 3 包括 2014-2019 年的所有年份。

输出

   id year
1   1 2014
2   1 2015
3   1 2016
4   1 2017
5   1 2018
6   1 2019
7   1 2014
12  3 2019
13  3 2014
14  3 2015
15  3 2016
16  3 2017
17  3 2018

推荐阅读