r - Check if columns of one data frame are present in another data frame with non-zero element in R
问题描述
I want to check if columns of one data frame are present in another data frame and the values of those columns in the second data frame should be non zero. For example,
I have a data frame df1
as follows:
indx1 indx2
aa 1 ac
ac tg 0
I have another data frame df
as follows:
col1 aa 1 ab 2 ac bd 5 tg 0
A 1 0 0 1 4
B 0 0 1 1 0
C 1 1 0 1 1
D 0 0 0 5 5
E 0 0 1 0 9
I want to check if any of the rows of df
can satisfy the criteria: df1[i,1]>0
and df1[i,2]>0
. i
goes from 1
to nrow(df1)
. For example:
when i = 1
, I want to check if any of the row of df
can satisfy the condition: aa > 0 & ac > 0
. Since, none of the rows satisfy the condition, the code will return 0
. when i = 2
, the condition would be: ac > 0 & tg > 0
. here one row of df
(5th row) satisfy the condition, so the code will return 1
. The output will be saved to a new column of df1
. The output will be as follows:
indx1 indx2 count_occ
aa 1 ac 0
ac tg 0 1
I have tried as follows:
for(i in 1:nrow(df1)){
d1 = subset(df, as.name(df1[i,1]) > 0 & as.name(df1[i,2]) > 0)
if(nrow(d1) >= 1){
df1[i,3] = 1
}else{
df1[i,3] = 0
}
}
But d1 = subset(df, as.name(df1[i,1]) > 0 & as.name(df1[i,2]) > 0)
is not giving me the correct output. Any help would be highly appreciated. TIA.
解决方案
We can use Map
- Loop over the 'indx1', 'indx2' columns of 'df' in
Map
- Extract the corresponding columns of 'df1' -
df1[[x]]
,df1[[y]]
- Create the multiple logical expression with
>
and&
- Check if there
any
TRUE
value from the rows of 'df1' - Coerce to binary (
+(
- or useas.integer
) - Convert the
list
output to avector
-unlist
and assign it to create the 'count_occ' column in 'df'
df$count_occ <- unlist(Map(function(x, y)
+(any(df1[[x]] > 0 & df1[[y]] > 0, na.rm = TRUE)), df$indx1, df$indx2))
-output
df
indx1 indx2 count_occ
1 aa 1 ac 0
2 ac tg 0 1
data
df <- structure(list(indx1 = c("aa 1", "ac"), indx2 = c("ac", "tg 0"
)), class = "data.frame", row.names = c(NA, -2L))
df1 <- structure(list(col1 = c("A", "B", "C", "D", "E"), `aa 1` = c(1L,
0L, 1L, 0L, 0L), `ab 2` = c(0L, 0L, 1L, 0L, 0L), ac = c(0L, 1L,
0L, 0L, 1L), `bd 5` = c(1L, 1L, 1L, 5L, 0L), `tg 0` = c(4L, 0L,
1L, 5L, 9L)), class = "data.frame", row.names = c(NA, -5L))