r - Summarise() 似乎无法正常工作
问题描述
这个问题对我来说听起来很不合理,以至于我确信我遗漏了一些明显的东西,但它找不到。我有一个tibble
,其前 200 行位于此问题的末尾。
我试过的代码是这样的:
record %>%
group_by(samples, rep, bests) %>%
summarise(prop = round(n()/samples, 2))
但是,这没有预期的输出。这就是它所做的:
> record %>%
+ group_by(samples, rep, bests) %>%
+ summarise(prop = round(n()/samples, 2))# %>%
`summarise()` regrouping output by 'samples', 'rep', 'bests' (override with `.groups` argument)
# A tibble: 200 x 4
# Groups: samples, rep, bests [41]
samples rep bests prop
<dbl> <dbl> <chr> <dbl>
1 10 1 Change 0.3
2 10 1 Change 0.3
3 10 1 Change 0.3
4 10 1 Stay 0.6
5 10 1 Stay 0.6
6 10 1 Stay 0.6
7 10 1 Stay 0.6
8 10 1 Stay 0.6
9 10 1 Stay 0.6
10 10 2 Change 0.5
# … with 190 more rows
它应该做什么:
> record %>%
+ group_by(samples, rep, bests) %>%
+ summarise(prop = round(n()/samples, 2))# %>%
`summarise()` regrouping output by 'samples', 'rep', 'bests' (override with `.groups` argument)
# A tibble: 4 x 4
# Groups: samples, rep, bests [41]
samples rep bests prop
<dbl> <dbl> <chr> <dbl>
1 10 1 Change 0.3
2 10 1 Stay 0.6
3 10 2 Change 0.5
4 10 2 Stay 0.5
我做错了什么?是summarising()
不是在总结?
我的数据:
record <- structure(list(samples = c(10, 10, 10, 10, 10, 10, 10, 10, 10,
10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,
10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,
10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,
10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,
10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,
10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,
10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,
10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,
10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,
10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,
10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,
10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10),
rep = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6,
6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8,
8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 10, 10, 10,
10, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11, 11, 11, 11, 11,
11, 11, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 13, 13, 13,
13, 13, 13, 13, 13, 13, 13, 14, 14, 14, 14, 14, 14, 14, 14,
14, 14, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 16, 16, 16,
16, 16, 16, 16, 16, 16, 16, 17, 17, 17, 17, 17, 17, 17, 17,
17, 17, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 19, 19, 19,
19, 19, 19, 19, 19, 19, 19, 20, 20, 20, 20, 20, 20, 20, 20,
20, 20, 21), bests = c("Change", "Stay", "Stay", "Stay",
"Change", "Stay", "Change", "Stay", "Stay", "Change", "Change",
"Stay", "Stay", "Change", "Change", "Stay", "Stay", "Stay",
"Change", "Change", "Stay", "Stay", "Change", "Stay", "Change",
"Change", "Change", "Change", "Change", "Change", "Change",
"Change", "Change", "Stay", "Change", "Change", "Change",
"Change", "Change", "Stay", "Stay", "Change", "Stay", "Stay",
"Change", "Change", "Change", "Change", "Change", "Stay",
"Change", "Stay", "Change", "Change", "Change", "Change",
"Stay", "Change", "Stay", "Stay", "Change", "Change", "Stay",
"Change", "Stay", "Change", "Stay", "Change", "Change", "Stay",
"Stay", "Change", "Change", "Stay", "Change", "Change", "Stay",
"Change", "Change", "Stay", "Change", "Change", "Stay", "Change",
"Change", "Change", "Change", "Change", "Change", "Change",
"Change", "Change", "Change", "Change", "Change", "Change",
"Change", "Stay", "Stay", "Change", "Stay", "Change", "Change",
"Change", "Stay", "Stay", "Change", "Stay", "Change", "Change",
"Change", "Change", "Change", "Change", "Stay", "Change",
"Stay", "Change", "Change", "Stay", "Change", "Change", "Change",
"Change", "Change", "Change", "Stay", "Change", "Change",
"Stay", "Change", "Stay", "Stay", "Change", "Stay", "Stay",
"Stay", "Change", "Change", "Stay", "Change", "Stay", "Stay",
"Stay", "Change", "Change", "Change", "Change", "Change",
"Stay", "Change", "Change", "Change", "Stay", "Change", "Change",
"Stay", "Change", "Stay", "Change", "Stay", "Change", "Stay",
"Change", "Change", "Change", "Change", "Change", "Change",
"Stay", "Stay", "Change", "Change", "Stay", "Stay", "Change",
"Change", "Stay", "Stay", "Change", "Change", "Stay", "Change",
"Stay", "Change", "Stay", "Stay", "Change", "Change", "Change",
"Change", "Change", "Stay", "Stay", "Change", "Stay", "Change",
"Stay", "Stay", "Change")), row.names = c(NA, -200L), class = c("tbl_df",
"tbl", "data.frame"))
解决方案
从dplyr
version开始>= 1.0
,如果每组多于一行,则没有限制每组summarise
只返回一行。在这里,在 OP 的代码中,它除以“样本”,这是完整的列,这就是问题所在。我们可以summarise
将“samples”作为“samples”的first
元素(不使用“samples”作为分组变量)
library(dplyr)
record %>%
group_by(rep, bests) %>%
summarise(samples = first(samples),
prop = round(n()/samples, 2), .groups = 'drop')
-输出
# A tibble: 41 x 4
# rep bests samples prop
# <dbl> <chr> <dbl> <dbl>
# 1 1 Change 10 0.3
# 2 1 Stay 10 0.6
# 3 2 Change 10 0.5
# 4 2 Stay 10 0.5
# 5 3 Change 10 0.7
# 6 3 Stay 10 0.3
# 7 4 Change 10 0.9
# 8 4 Stay 10 0.1
# 9 5 Change 10 0.6
#10 5 Stay 10 0.4
# … with 31 more rows
或者另一种选择是首先count
有效地汇总到唯一的行,然后创建一个比例mutate
record %>%
count(samples, rep, bests) %>%
mutate(prop = round(n/samples, 2))
推荐阅读
- .net - 存储在数据表中的自定义 SQL 查询
- sql - 调整 SQL Server 查询
- php - SimpleXML 搜索子节点并检索所有节点
- git - 在推送到 git 远程之前检查提交大小
- watson-conversation - Watson 对话不像文档中提到的那样运行
- office-js - 在 Office Online 中使用 JS api 插入 OpenXML 后阻止文档更新的错误
- vba - Excel 或 VBA - 如果列中的值匹配,则比较另一列中的值
- php - 有没有办法检测会话是否被多人访问?
- spring-boot - RedisCacheManager TTL 不工作
- geocoding - 如何使用maxmind从IP地址获取纬度和经度?