r - 如何仅选择 r 中数据框的每一年出现的个人
问题描述
我正在尝试对从 2014 年开始到 2019 年结束的整个研究期间存在的个人进行子集化。因此,输出将是数据框中每年都存在的名称列表。
我试过以下代码:
big_data <- dplyr::bind_rows(df1, df2, df3, df4, df5, df6) # I've bound 6 different dataframes (each with data from one of the years) by row. These dfs have a different number of rows and columns. Some columns repeat in different years, while others don't.
Date <- as.POSIXlt.Date(big_data$Date)
Year <- separate(big_data, Date, into = c('Month', 'Day', 'Year') %>% select(Year)) # I've extracted the Year from the Date variable (DD/MM/YYYY)
Year <- big_data$Year # I've added it to the big_data
Interval <- Year %between% c("2014", "2019") # I've created a timeperiod with the start and end years of the study
big_data [, all.names(FocalID %in% Interval)] # I've tried to get the names of the individuals (in variable FocalID) that are present in the interval (but probably doesn't mean in every year)
显然这段代码不起作用。你能帮帮我吗?谢谢!
解决方案
如果您的数据框的行带有id
和year
,例如:
big_data <- data.frame(
id = c(1,1,1,1,1,1,1,2,2,2,2,3,3,3,3,3,3),
year = c(2014:2019, 2014:2019, 2014:2018)
)
id year
1 1 2014
2 1 2015
3 1 2016
4 1 2017
5 1 2018
6 1 2019
7 1 2014
8 2 2015
9 2 2016
10 2 2017
11 2 2018
12 3 2019
13 3 2014
14 3 2015
15 3 2016
16 3 2017
17 3 2018
您可以使用dplyr
包 fromtidyverse
到group_by
单个主题id
,然后检查以确保数据行包含 2014-2019 中的所有年份year
。这将filter
在给定的所有行中id
- 如果代表所有年份。
library(dplyr)
big_data %>%
group_by(id) %>%
filter(all(2014:2019 %in% year))
基本 R 选项如下:
big_data[big_data$id %in% Reduce(intersect, split(big_data$id, big_data$year)), ]
在此示例中,id
1 和 3 包括 2014-2019 年的所有年份。
输出
id year
1 1 2014
2 1 2015
3 1 2016
4 1 2017
5 1 2018
6 1 2019
7 1 2014
12 3 2019
13 3 2014
14 3 2015
15 3 2016
16 3 2017
17 3 2018