首页 > 解决方案 > Check if columns of one data frame are present in another data frame with non-zero element in R

问题描述

I want to check if columns of one data frame are present in another data frame and the values of those columns in the second data frame should be non zero. For example,

I have a data frame df1 as follows:

indx1   indx2
aa 1     ac
ac       tg 0

I have another data frame df as follows:

col1    aa 1    ab 2    ac  bd 5    tg 0
A        1       0       0    1      4
B        0       0       1    1      0
C        1       1       0    1      1
D        0       0       0    5      5
E        0       0       1    0      9

I want to check if any of the rows of df can satisfy the criteria: df1[i,1]>0 and df1[i,2]>0. i goes from 1 to nrow(df1). For example:

when i = 1, I want to check if any of the row of df can satisfy the condition: aa > 0 & ac > 0. Since, none of the rows satisfy the condition, the code will return 0. when i = 2, the condition would be: ac > 0 & tg > 0. here one row of df (5th row) satisfy the condition, so the code will return 1. The output will be saved to a new column of df1. The output will be as follows:

indx1   indx2   count_occ
aa 1    ac       0
ac      tg 0     1

I have tried as follows:

for(i in 1:nrow(df1)){
    d1 = subset(df, as.name(df1[i,1]) > 0 & as.name(df1[i,2]) > 0)
    if(nrow(d1) >= 1){
      df1[i,3] = 1
    }else{
      df1[i,3] = 0
    }
  }

But d1 = subset(df, as.name(df1[i,1]) > 0 & as.name(df1[i,2]) > 0) is not giving me the correct output. Any help would be highly appreciated. TIA.

标签: r

解决方案


We can use Map

  1. Loop over the 'indx1', 'indx2' columns of 'df' in Map
  2. Extract the corresponding columns of 'df1' - df1[[x]], df1[[y]]
  3. Create the multiple logical expression with > and &
  4. Check if there any TRUE value from the rows of 'df1'
  5. Coerce to binary (+( - or use as.integer)
  6. Convert the list output to a vector - unlist and assign it to create the 'count_occ' column in 'df'
df$count_occ <- unlist(Map(function(x, y) 
      +(any(df1[[x]] > 0 & df1[[y]] > 0, na.rm = TRUE)), df$indx1, df$indx2))

-output

df
  indx1 indx2 count_occ
1  aa 1    ac         0
2    ac  tg 0         1

data

df <- structure(list(indx1 = c("aa 1", "ac"), indx2 = c("ac", "tg 0"
)), class = "data.frame", row.names = c(NA, -2L))

df1 <- structure(list(col1 = c("A", "B", "C", "D", "E"), `aa 1` = c(1L, 
0L, 1L, 0L, 0L), `ab 2` = c(0L, 0L, 1L, 0L, 0L), ac = c(0L, 1L, 
0L, 0L, 1L), `bd 5` = c(1L, 1L, 1L, 5L, 0L), `tg 0` = c(4L, 0L, 
1L, 5L, 9L)), class = "data.frame", row.names = c(NA, -5L))

推荐阅读