首页 > 解决方案 > count total and positive samples by group

问题描述

I have a dataframe like this;

df <- data.frame(concentration=c(0,0,0,0,2,2,2,2,4,4,6,6,6),
             result=c(0,0,0,0,0,0,1,0,1,0,1,1,1))

I want to count the total number of results for each concentration level. I want to count the number of positive samples for each concentration level. And I want to create a new dataframe with concentration level, total results, and number positives.

conc pos_c total_c
0    0     4
2    1     4
4    1     2
6    3     3

This is what I've come up with so far using plyr;

c <- count(df, "concentration")
r <- count(df, "concentration","result")
names(c)[which(names(c) == "freq")] <- "total_c"
names(r)[which(names(r) == "freq")] <- "pos_c"
cbind(c,r)

  concentration total_c concentration pos_c
1             0       4             0     0
2             2       4             2     1
3             4       2             4     1
4             6       3             6     3

Repeating concentration column. I think there is probably a way better/easier way to do this I'm missing. Maybe another library. I'm not sure how to do this in R and it's relatively new to me. Thanks.

标签: r

解决方案


我们需要一个 group by sum。使用tidyverse,我们按'浓度(group_by)分组,然后summarise得到两列 - 1)sum逻辑表达式(result > 0),2)行数(n()

library(dplyr)
df %>% 
  group_by(conc = concentration) %>% 
  summarise(pos_c = sum(result > 0), # in the example just sum(result) 
                    total_c = n())
# A tibble: 4 x 3
#   conc pos_c total_c
#  <dbl> <int>   <int>
#1     0     0       4
#2     2     1       4
#3     4     1       2
#4     6     3       3

base Rtable和一起使用addmargins

addmargins(table(df), 2)[,-1]

推荐阅读