r - 对于具有 dplyr 的其他两个变量的每个唯一组合,仅对分组数据框中的变量求和一次
问题描述
我有一张长桌子,上面有 和 的重复area
组合cluster
。
counts <- tibble::tribble(
~age, ~area, ~cluster, ~norm.to.area,
"gw_25", "cingulate", "cluster_1", 0.03,
"gw_20", "cingulate", "cluster_1", 0.03,
"gw_18", "hippocampus", "cluster_1", 0.02,
"gw_25", "insula", "cluster_1", 0.01,
"gw_20", "motor", "cluster_1", 0.01,
"gw_22", "motor", "cluster_1", 0.01,
"gw_25", "motor", "cluster_1", 0.01,
"gw_14", "motor", "cluster_1", 0.01,
"gw_18", "motor", "cluster_1", 0.01,
"gw_19", "motor", "cluster_1", 0.01,
"gw_17", "motor", "cluster_1", 0.01,
"gw_20", "occipital", "cluster_1", 0.01,
"gw_17", "occipital", "cluster_1", 0.01,
"gw_18", "occipital", "cluster_1", 0.01,
"gw_19", "occipital", "cluster_1", 0.01,
"gw_22", "occipital", "cluster_1", 0.01,
"gw_14", "occipital", "cluster_1", 0.01,
"gw_22", "parietal", "cluster_1", 0,
"gw_25", "parietal", "cluster_1", 0,
"gw_17", "parietal", "cluster_1", 0,
"gw_19", "parietal", "cluster_1", 0,
"gw_20", "parietal", "cluster_1", 0,
"gw_20", "PFC", "cluster_1", 0.01,
"gw_22", "PFC", "cluster_1", 0.01,
"gw_25", "PFC", "cluster_1", 0.01
)
我想创建一个新变量 ,它是每个sum.norm.to.area
的总和,对的每个组合使用仅 ONCE的值。norm.to.area
cluster
norm.to.area
area / subcluster.merge
我试过了group_by
cluster
,但这会在出现给定组合时多次对值求和。
counts %>% group_by(cluster) %>% mutate(sum.norm.to.area = sum(norm.to.area)
谢谢你的建议。
更新 1:
尝试使用如下建议的汇总,但发生了同样的事情(当然,除了没有添加为新列):
> counts %>% group_by(subcluster.merge, area) %>% summarize(sum(norm.to.area))
tibble::tribble(
~cluster . , ~area, ~sum.norm.to.area.,
"cluster_1", "PFC", 0.06,
"cluster_1", "somatosensory", 0.05,
"cluster_1", "motor", 0.07,
"cluster_1", "parietal", 0,
"cluster_1", "temporal", 0.03,
"cluster_1", "occipital", 0.06,
"cluster_1", "hippocampus", 0.02,
"cluster_1", "insula", 0.01,
"cluster_1", "cingulate", 0.06,
"cluster_10-34", "PFC", 0.42,
"cluster_10-34", "somatosensory", 0.35,
"cluster_10-34", "motor", 0.48,
"cluster_10-34", "parietal", 0.36,
"cluster_10-34", "temporal", 0.28,
"cluster_10-34", "occipital", 0.4,
"cluster_10-34", "hippocampus", 0.12,
"cluster_10-34", "insula", 0,
"cluster_10-34", "cingulate", 0,
"cluster_11", "PFC", 0.18,
"cluster_11", "somatosensory", 0.15,
"cluster_11", "motor", 0.14,
"cluster_11", "parietal", 0.12,
"cluster_11", "temporal", 0.04,
"cluster_11", "occipital", 0.18,
"cluster_11", "hippocampus", 0.02
)
更新 2
这是我想要的输出,但我得到它的方式太复杂了。我想找到一种更简单的方法,使用 mutate 而不必使用join
.
> tmp <- counts %>% distinct(area, cluster, .keep_all = TRUE) %>%
add_count(cluster, wt = norm.to.area, name = "sum.norm.to.area")
counts %>% left_join(tmp, by = c("cluster", "area"))
期望的输出:
是对和的所有唯一组合sum.norm.to.area
相加norm.to.area
(仅一次)的结果:area
cluster
tibble::tribble(
~age, ~area, ~cluster, ~norm.to.area, ~sum.norm.to.area,
"gw_25", "cingulate", "cluster_1", 0.03, 0.11,
"gw_20", "cingulate", "cluster_1", 0.03, 0.11,
"gw_18", "hippocampus", "cluster_1", 0.02, 0.11,
"gw_25", "insula", "cluster_1", 0.01, 0.11,
"gw_20", "motor", "cluster_1", 0.01, 0.11,
"gw_22", "motor", "cluster_1", 0.01, 0.11,
"gw_25", "motor", "cluster_1", 0.01, 0.11,
"gw_14", "motor", "cluster_1", 0.01, 0.11,
"gw_18", "motor", "cluster_1", 0.01, 0.11,
"gw_19", "motor", "cluster_1", 0.01, 0.11,
"gw_17", "motor", "cluster_1", 0.01, 0.11,
"gw_20", "occipital", "cluster_1", 0.01, 0.11,
"gw_17", "occipital", "cluster_1", 0.01, 0.11,
"gw_18", "occipital", "cluster_1", 0.01, 0.11,
"gw_19", "occipital", "cluster_1", 0.01, 0.11,
"gw_22", "occipital", "cluster_1", 0.01, 0.11,
"gw_14", "occipital", "cluster_1", 0.01, 0.11,
"gw_22", "parietal", "cluster_1", 0, 0.11,
"gw_25", "parietal", "cluster_1", 0, 0.11,
"gw_17", "parietal", "cluster_1", 0, 0.11,
"gw_19", "parietal", "cluster_1", 0, 0.11,
"gw_20", "parietal", "cluster_1", 0, 0.11,
"gw_20", "PFC", "cluster_1", 0.01, 0.11,
"gw_22", "PFC", "cluster_1", 0.01, 0.11,
"gw_25", "PFC", "cluster_1", 0.01, 0.11,
"gw_18", "PFC", "cluster_1", 0.01, 0.11,
"gw_19", "PFC", "cluster_1", 0.01, 0.11,
"gw_17", "PFC", "cluster_1", 0.01, 0.11,
"gw_22", "somatosensory", "cluster_1", 0.01, 0.11,
"gw_20", "somatosensory", "cluster_1", 0.01, 0.11,
"gw_25", "somatosensory", "cluster_1", 0.01, 0.11,
"gw_18", "somatosensory", "cluster_1", 0.01, 0.11,
"gw_19", "somatosensory", "cluster_1", 0.01, 0.11,
"gw_25", "temporal", "cluster_1", 0.01, 0.11,
"gw_19", "temporal", "cluster_1", 0.01, 0.11,
"gw_20", "temporal", "cluster_1", 0.01, 0.11
)
解决方案
使用dplyr
我们可以group_by
cluster
而且sum
只有每个area
.
library(dplyr)
counts %>%
group_by(cluster) %>%
mutate(sum.norm = sum(norm.to.area[!duplicated(area)]))
# age area cluster norm.to.area sum.norm
# <chr> <chr> <chr> <dbl> <dbl>
# 1 gw_25 cingulate cluster_1 0.03 0.09
# 2 gw_20 cingulate cluster_1 0.03 0.09
# 3 gw_18 hippocampus cluster_1 0.02 0.09
# 4 gw_25 insula cluster_1 0.01 0.09
# 5 gw_20 motor cluster_1 0.01 0.09
# 6 gw_22 motor cluster_1 0.01 0.09
# 7 gw_25 motor cluster_1 0.01 0.09
# 8 gw_14 motor cluster_1 0.01 0.09
# 9 gw_18 motor cluster_1 0.01 0.09
#10 gw_19 motor cluster_1 0.01 0.09
# … with 15 more rows
推荐阅读
- html - 响应式网格模板列,一列具有固定百分比宽度,第二列具有自动宽度
- kotlin - 读取一系列未定义大小的数字并打印其第一次出现的最大数字和位置
- node.js - 为什么我的环境会破坏?sha3的纱线安装错误
- javascript - 如何在Javascript中循环一个没有索引的对象数组作为字段?
- oracle - AWR 报告 Oracle OEM 中的时区更改
- python - 如何使用“列”方向将 Pandas 数据框中的索引隐藏到 JSON 函数 DataFrame.to_json()
- azure - 您可以向开发环境 AAD 令牌添加可选声明吗?
- angular - Ng-bootstrap 下拉菜单未在带有单击处理程序的表行上激活
- c# - 参数 ... 在方法中 ... 在服务中 ... 为空
- python - Django 抛出 ValueError:源代码字符串不能包含空字节