r - count total and positive samples by group
问题描述
I have a dataframe like this;
df <- data.frame(concentration=c(0,0,0,0,2,2,2,2,4,4,6,6,6),
result=c(0,0,0,0,0,0,1,0,1,0,1,1,1))
I want to count the total number of results for each concentration level. I want to count the number of positive samples for each concentration level. And I want to create a new dataframe with concentration level, total results, and number positives.
conc pos_c total_c
0 0 4
2 1 4
4 1 2
6 3 3
This is what I've come up with so far using plyr;
c <- count(df, "concentration")
r <- count(df, "concentration","result")
names(c)[which(names(c) == "freq")] <- "total_c"
names(r)[which(names(r) == "freq")] <- "pos_c"
cbind(c,r)
concentration total_c concentration pos_c
1 0 4 0 0
2 2 4 2 1
3 4 2 4 1
4 6 3 6 3
Repeating concentration column. I think there is probably a way better/easier way to do this I'm missing. Maybe another library. I'm not sure how to do this in R and it's relatively new to me. Thanks.
解决方案
我们需要一个 group by sum
。使用tidyverse
,我们按'浓度(group_by
)分组,然后summarise
得到两列 - 1)sum
逻辑表达式(result > 0
),2)行数(n()
)
library(dplyr)
df %>%
group_by(conc = concentration) %>%
summarise(pos_c = sum(result > 0), # in the example just sum(result)
total_c = n())
# A tibble: 4 x 3
# conc pos_c total_c
# <dbl> <int> <int>
#1 0 0 4
#2 2 1 4
#3 4 1 2
#4 6 3 3
或base R
与table
和一起使用addmargins
addmargins(table(df), 2)[,-1]