首页 > 解决方案 > 如何按ID排序,然后检测同一ID内的具体差异,然后将所有内容显示在一个数据框中?

问题描述

我之前问过类似的问题。R:如何通过ID排序,然后检测同一个ID内的差异?

但是这一次,我想在同一个 ID 中显示特定的样本 ID。我的数据框是

dataframe <- data.frame(ID=c("ID1","ID2","ID3","ID4", "ID2", "ID2", "ID3","ID4", "ID5","ID1"), 
                    sample_ID=c(1:10),sample_date=c(1991-05-23, 1991-05-24,1991-05-24, 1991-05-26,1991-05-27,1991-05-28,1991-05-30,1991-05-31, 1991-06-03, 1991-06-03), 
                    sex =c(1,2,1,2,2,2,1,2,1,1), and_so_on1 =c(1), and_so_on2 =c(0))

从这里开始,我想按相同的 ID 排序并检测相同的 ID 是否有非常接近的 sample_date(例如这次在 1 天内)。然后我想显示如下结果,

outcome <- data.frame(ID=c("ID2","ID2"), sample_ID=c(5,6),sample_date=c(1991-05-26,1991-05-27),sex=c(2),and_so_on1 =c(1), and_so_on2 =c(0))

标签: r

解决方案


您可以通过将当前日期与前一个日期以及提前日期与当前日期进行比较来做到这一点。当这是 1 时,您选择记录。

dataframe <- data.frame(ID=c("ID1","ID2","ID3","ID4", "ID2", "ID2", "ID3","ID4", "ID5","ID1"), 
                        sample_ID=c(1:10),
                        sample_date=c("1991-05-23", "1991-05-24","1991-05-24", "1991-05-26","1991-05-27","1991-05-28","1991-05-30","1991-05-31", "1991-06-03", "1991-06-03"), 
                        sex =c(1,2,1,2,2,2,1,2,1,1), 
                        and_so_on1 =c(1), 
                        and_so_on2 =c(0))

library(dplyr)

dataframe %>%
  mutate(sample_date = as.Date(sample_date)) %>%
  arrange(ID, sample_date) %>%
  group_by(ID) %>% 
  filter((sample_date - lag(sample_date)) == 1 |
         (lead(sample_date) - sample_date) == 1)

# A tibble: 2 x 6
# Groups:   ID [1]
  ID    sample_ID sample_date   sex and_so_on1 and_so_on2
  <fct>     <int> <date>      <dbl>      <dbl>      <dbl>
1 ID2           5 1991-05-27      2          1          0
2 ID2           6 1991-05-28      2          1          0

推荐阅读