首页 > 解决方案 > 对多个值进行分组和划分

问题描述

我想按几个变量对我的数据集进行分组,然后取数值变量的总和。然后将各个值除以该总和以获得一个比例,并将其作为一列进行变异。

例如,假设我有一个这样的数据集:

year         disastertype area(km^2)     country
2001           earthquake   1907.098 Afghanistan
2001           earthquake   3635.378 Afghanistan
2001           earthquake   5889.177 Afghanistan
2001 extreme temperature    8042.396 Afghanistan
2001 extreme temperature   11263.485 Afghanistan
2001 extreme temperature   11802.311 Afghanistan

我可以使用相对于灾害类型和国家/地区的面积总和

test_two <- test_one %>%group_by(disastertype, country,`area(km^2)`, year) %>% count %>% aggregate(. ~ disastertype + country + year,data=., sum)

但是当我尝试使用以下方法除以这个总和时:

data_test$`area(km^2)` %>%  map_dbl(~ .x/data_test2$`area(km^2)`)

错误:结果 1 必须是单个双精度,而不是长度为 2 的双精度向量

预期结果:

    year         disastertype area(km^2)     country  proportion
1   2001           earthquake   1907.098 Afghanistan  0.1668261   
10  2001           earthquake   3635.378 Afghanistan  0.3180099
65  2001           earthquake   5889.177 Afghanistan  0.5151642
109 2001 extreme temperature    8042.396 Afghanistan  0.2585299
135 2001 extreme temperature   11263.485 Afghanistan  0.3620746
146 2001 extreme temperature   11802.311 Afghanistan  0.3793956

可重现的代码:

structure(list(year = c(2001, 2001, 2001, 2001, 2001, 2001), 
    disastertype = c("earthquake", "earthquake", "earthquake", 
    "extreme temperature ", "extreme temperature ", "extreme temperature "
    ), `area(km^2)` = c(1907.09808242381, 3635.37825411105, 5889.17746880181, 
    8042.39623016696, 11263.4848508564, 11802.3111500339), country = c("Afghanistan", 
    "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", 
    "Afghanistan")), row.names = c(1L, 10L, 65L, 109L, 135L, 
146L), class = "data.frame")

标签: rpurrr

解决方案


你不应该分组area(km^2)

df %>%
  group_by(year, country, disastertype) %>%
  mutate(proportion = `area(km^2)` / sum(`area(km^2)`)) %>%
  ungroup()

推荐阅读