首页 > 解决方案 > Summarise() 似乎无法正常工作

问题描述

这个问题对我来说听起来很不合理,以至于我确信我遗漏了一些明显的东西,但它找不到。我有一个tibble,其前 200 行位于此问题的末尾。

我试过的代码是这样的:

record %>%
  group_by(samples, rep, bests) %>%
  summarise(prop = round(n()/samples, 2))

但是,这没有预期的输出。这就是它所做的:

> record %>%
+   group_by(samples, rep, bests) %>%
+   summarise(prop = round(n()/samples, 2))# %>%
`summarise()` regrouping output by 'samples', 'rep', 'bests' (override with `.groups` argument)
# A tibble: 200 x 4
# Groups:   samples, rep, bests [41]
   samples   rep bests   prop
     <dbl> <dbl> <chr>  <dbl>
 1      10     1 Change   0.3
 2      10     1 Change   0.3
 3      10     1 Change   0.3
 4      10     1 Stay     0.6
 5      10     1 Stay     0.6
 6      10     1 Stay     0.6
 7      10     1 Stay     0.6
 8      10     1 Stay     0.6
 9      10     1 Stay     0.6
10      10     2 Change   0.5
# … with 190 more rows

它应该做什么:

> record %>%
+   group_by(samples, rep, bests) %>%
+   summarise(prop = round(n()/samples, 2))# %>%
`summarise()` regrouping output by 'samples', 'rep', 'bests' (override with `.groups` argument)
# A tibble: 4 x 4
# Groups:   samples, rep, bests [41]
   samples   rep bests   prop
     <dbl> <dbl> <chr>  <dbl>
 1      10     1 Change   0.3
 2      10     1 Stay     0.6
 3      10     2 Change   0.5
 4      10     2 Stay     0.5

我做错了什么?是summarising()不是在总结?

我的数据:

record <- structure(list(samples = c(10, 10, 10, 10, 10, 10, 10, 10, 10, 
10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 
10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 
10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 
10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 
10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 
10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 
10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 
10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 
10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 
10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 
10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 
10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10), 
    rep = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 
    2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 
    4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 
    6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 
    8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 10, 10, 10, 
    10, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11, 11, 11, 11, 11, 
    11, 11, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 13, 13, 13, 
    13, 13, 13, 13, 13, 13, 13, 14, 14, 14, 14, 14, 14, 14, 14, 
    14, 14, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 16, 16, 16, 
    16, 16, 16, 16, 16, 16, 16, 17, 17, 17, 17, 17, 17, 17, 17, 
    17, 17, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 19, 19, 19, 
    19, 19, 19, 19, 19, 19, 19, 20, 20, 20, 20, 20, 20, 20, 20, 
    20, 20, 21), bests = c("Change", "Stay", "Stay", "Stay", 
    "Change", "Stay", "Change", "Stay", "Stay", "Change", "Change", 
    "Stay", "Stay", "Change", "Change", "Stay", "Stay", "Stay", 
    "Change", "Change", "Stay", "Stay", "Change", "Stay", "Change", 
    "Change", "Change", "Change", "Change", "Change", "Change", 
    "Change", "Change", "Stay", "Change", "Change", "Change", 
    "Change", "Change", "Stay", "Stay", "Change", "Stay", "Stay", 
    "Change", "Change", "Change", "Change", "Change", "Stay", 
    "Change", "Stay", "Change", "Change", "Change", "Change", 
    "Stay", "Change", "Stay", "Stay", "Change", "Change", "Stay", 
    "Change", "Stay", "Change", "Stay", "Change", "Change", "Stay", 
    "Stay", "Change", "Change", "Stay", "Change", "Change", "Stay", 
    "Change", "Change", "Stay", "Change", "Change", "Stay", "Change", 
    "Change", "Change", "Change", "Change", "Change", "Change", 
    "Change", "Change", "Change", "Change", "Change", "Change", 
    "Change", "Stay", "Stay", "Change", "Stay", "Change", "Change", 
    "Change", "Stay", "Stay", "Change", "Stay", "Change", "Change", 
    "Change", "Change", "Change", "Change", "Stay", "Change", 
    "Stay", "Change", "Change", "Stay", "Change", "Change", "Change", 
    "Change", "Change", "Change", "Stay", "Change", "Change", 
    "Stay", "Change", "Stay", "Stay", "Change", "Stay", "Stay", 
    "Stay", "Change", "Change", "Stay", "Change", "Stay", "Stay", 
    "Stay", "Change", "Change", "Change", "Change", "Change", 
    "Stay", "Change", "Change", "Change", "Stay", "Change", "Change", 
    "Stay", "Change", "Stay", "Change", "Stay", "Change", "Stay", 
    "Change", "Change", "Change", "Change", "Change", "Change", 
    "Stay", "Stay", "Change", "Change", "Stay", "Stay", "Change", 
    "Change", "Stay", "Stay", "Change", "Change", "Stay", "Change", 
    "Stay", "Change", "Stay", "Stay", "Change", "Change", "Change", 
    "Change", "Change", "Stay", "Stay", "Change", "Stay", "Change", 
    "Stay", "Stay", "Change")), row.names = c(NA, -200L), class = c("tbl_df", 
"tbl", "data.frame"))

标签: rdplyr

解决方案


dplyrversion开始>= 1.0,如果每组多于一行,则没有限制每组summarise只返回一行。在这里,在 OP 的代码中,它除以“样本”,这是完整的列,这就是问题所在。我们可以summarise将“samples”作为“samples”的first元素(不使用“samples”作为分组变量)

library(dplyr)
record %>%
    group_by(rep, bests) %>%
    summarise(samples = first(samples),
               prop = round(n()/samples, 2), .groups = 'drop')

-输出

# A tibble: 41 x 4
#     rep bests  samples  prop
#   <dbl> <chr>    <dbl> <dbl>
# 1     1 Change      10   0.3
# 2     1 Stay        10   0.6
# 3     2 Change      10   0.5
# 4     2 Stay        10   0.5
# 5     3 Change      10   0.7
# 6     3 Stay        10   0.3
# 7     4 Change      10   0.9
# 8     4 Stay        10   0.1
# 9     5 Change      10   0.6
#10     5 Stay        10   0.4
# … with 31 more rows

或者另一种选择是首先count有效地汇总到唯一的行,然后创建一个比例mutate

record %>% 
    count(samples, rep, bests) %>%
    mutate(prop = round(n/samples, 2))

推荐阅读