首页 > 解决方案 > 如何为两个 r 数据帧之间的不均匀匹配行提取二进制响应?

问题描述

从这两个数据帧df1df2,我想df1根据以下条件分配“是”或“否”。如果Date, Date1, Date2,...,中的任何一个与 的列的Date6至少一个 Date 匹配,则它必须是, else 。我可以轻松地处理条件,但这里的问题是两个数据帧之间的行数不均匀,如本示例示例和错误所示。在这种情况下,不需要逐行匹配,我需要的是,如果任何日期 与任何日期匹配(至少一个匹配),那么它是肯定的,否则没有。Datedf1yesnoifelsedf1df2

df1<-structure(list(Date = structure(3634, class = "Date"), Date1 = structure(3633, class = "Date"), 
    Date2 = structure(3632, class = "Date"), Date3 = structure(3631, class = "Date"), 
    Date4 = structure(3630, class = "Date"), Date5 = structure(3629, class = "Date"), 
    Date6 = structure(3628, class = "Date")), row.names = c(NA, 
-1L), class = c("tbl_df", "tbl", "data.frame"))

df2<-structure(list(yr = c(1979, 1979), day = c(351, 347), Date = structure(c(3637, 
3633), class = "Date")), row.names = c(NA, -2L), class = c("tbl_df", 
"tbl", "data.frame"))

df1$y_n<-if_else(df2$Date %in% df1$Date |
                 df2$Date %in% df1$Date1 |
                  df2$Date %in% df1$Date3 |
                   df2$Date %in% df1$Date4 |
                   df2$Date %in% df1$Date5 |
                   df2$Date %in% df1$Date6,"yes","no")

标签: rdataframetidyverse

解决方案


使用基数 R,我们可以使用sapply/lapply来检查日期。假设您将有不止一行数据df1

df1 <- rbind(df1,  df1)
df1$y_n <- c("no", "yes")[(rowSums(sapply(df1, `%in%`, df2$Date)) > 0) + 1]

# Date       Date1      Date2      Date3      Date4      Date5      Date6      y_n  
# <date>     <date>     <date>     <date>     <date>     <date>     <date>    <chr>
#1 1979-12-14 1979-12-13 1979-12-12 1979-12-11 1979-12-10 1979-12-09 1979-12-08 yes  
#2 1979-12-14 1979-12-13 1979-12-12 1979-12-11 1979-12-10 1979-12-09 1979-12-08 yes  

或与lapply

df1$y_n <- c("no", "yes")[(Reduce(`|`, lapply(df1, `%in%`, df2$Date))) + 1]

推荐阅读