首页 > 解决方案 > R: count times per column a condition is met and row names appear in a list

问题描述

I have a dataframe with count information (df1)

rownames sample1 sample2 sample3
m1 0 5 1
m2 1 7 5
m3 6 2 0
m4 3 1 0

and a second with sample information (df2)

rownames batch total count
sample1 a 10
sample2 b 15
sample3 a 6

I also have two lists with information about the m values (could easily be turned into another data frame if necessary but I would rather not add to the count information as it is quite large). No patterns (such as even and odd) exist, I am just using a very simplistic example

x <- c("m1", "m3") and y <- c("m2", "m4")

What I would like to do is add another two columns to the sample information. This is a count of each m per sample that has a value of above 5 and appears in list x or y

rownames batch total count x y
sample1 a 10 1 0
sample2 b 15 1 1
sample3 a 6 0 1

My current strategy is to make a list of values for both x and y and then append them to df2. Here are my attempts so far:

numX <- colSums(df1[sum(rownames(df1)>10 %in% x),]) and numX <- colSums(df1[sum(rownames(df1)>10 %in% x),]) both return a list of 0s

numX <- colSums(df1[rownames(df1)>10 %in% x,]) returns a list of the sum of count values meeting the conditions for each column

numX <- length(df1[rownames(df1)>10 %in% novel,]) returns the number of times the condition is met (in this example 2L)

I am not really sure how to approach this so I have just been throwing around attempts. I've tried looking for answers but maybe I am just struggling to find the proper wording.

标签: rdataframesubset

解决方案


We may do this with rowwise

library(dplyr)
df2 %>% 
   rowwise %>%
    mutate(x = +(sum(df1[[rownames]][df1$rownames %in% x]) >= 5), 
           y = +(sum(df1[[rownames]][df1$rownames %in% y]) >= 5)) %>%
    ungroup

-output

# A tibble: 3 × 5
  rownames batch totalcount     x     y
  <chr>    <chr>      <int> <int> <int>
1 sample1  a             10     1     0
2 sample2  b             15     1     1
3 sample3  a              6     0     1

Or based on the data, a base R option would be

out <- aggregate(. ~ grp, FUN = sum, 
     transform(df1,  grp = c('x', 'y')[1 + (rownames %in% y)] )[-1])
df2[out$grp] <- +(t(out[-1]) >= 5)

-output

> df2
  rownames batch totalcount x y
1  sample1     a         10 1 0
2  sample2     b         15 1 1
3  sample3     a          6 0 1

data

df1 <- structure(list(rownames = c("m1", "m2", "m3", "m4"), sample1 = c(0L, 
1L, 6L, 3L), sample2 = c(5L, 7L, 2L, 1L), sample3 = c(1L, 5L, 
0L, 0L)), class = "data.frame", row.names = c(NA, -4L))

df2 <- structure(list(rownames = c("sample1", "sample2", "sample3"), 
    batch = c("a", "b", "a"), totalcount = c(10L, 15L, 6L)), 
class = "data.frame", row.names = c(NA, 
-3L))

推荐阅读