首页 > 解决方案 > 多组按计数和百分比

问题描述

假设我有这个df:

> df <- data.frame(letter = sample(letters[1:4], 15, replace=TRUE),
+                  time = c("one", "one", "one", "two", "two", "one", "two", "two", "two", "one","one","one","two","one","two"),
+                  stringsAsFactors = FALSE)
> df
   letter time
1       d  one
2       a  one
3       a  one
4       b  two
5       c  two
6       a  one
7       d  two
8       a  two
9       b  two
10      b  one
11      d  one
12      b  one
13      c  two
14      a  one
15      a  two

我希望将它们分组Value并创建一列和另一列,用它们各自的计数time_one命名,加上它们各自的百分比。这是我的出发点:time_twoValue

> x <- df %>%
+ mutate(Value = letter,
+       n = n()) %>%
+ group_by(Value) %>%
+ summarise(Quantity = length(Value),
+          Percentage = first(length(Value)/n))
> x
  Value Quantity  Percentage
1 a            6           0.4  
2 b            4           0.267
3 c            2           0.133
4 d            3           0.2  

正如您在上面看到的,我有 each 的计数Value,但我需要将 eachQuantity中的 each与 time 列中的值隔开Value。所以,我将以这样的方式结束:onetwo

  Value time_one  Percentage   time_two    Percentage
1 a            5  0.5          1           0.2     
2 b            2  0.2          2           0.4    
3 c            1  0.1          1           0.2       
4 d            2  0.2          1           0.2

PS:我已经检查了两个两个表并为频率答案创建了一个表,它们与我正在寻找的答案很接近,但由于我仍然不太了解%>%, group_by, mutate,summarise组合,它一直是一种非常陡峭的学习曲线,可以使用它们来区分这些解决方案的计数和百分比,以获得我需要的解决方案。

标签: rgroup-bydplyr

解决方案


恐怕我不使用现代整洁的 R,但如果您的需要可以接受,这里有一个基本 R 的解决方案。

df <- data.frame(letter = sample(letters[1:4], 15, replace=TRUE),
                 time = c("one", "one", "one", "two", "two", "one", "two", "two", "two", "one","one","one","two","one","two"),
                 stringsAsFactors = FALSE)
# make sure your letter is a factor with all levels otherwise the subsequent cbind doesn#t work
df$letter = factor(df$letter, levels=letters[1:4])

# get the counts 
x = sapply(split(df$letter, df$time), table)

# get the percentages and cbind together 
x2 = cbind(x, apply(x, 2, function(x) x/sum(x)))

colnames(x2) = c("time_one", "time_two", "percent_one", "percent_two")


  time_one time_two percent_one percent_two
a        0        1         0.0   0.1428571
b        4        4         0.5   0.5714286
c        0        1         0.0   0.1428571
d        4        1         0.5   0.1428571

推荐阅读