首页 > 解决方案 > 使用 dplyr 合并/合并行

问题描述

看起来像的数据

Month    Location    Money
1          Miami      12
1          Cal        15
2          Miami       5
2          Cal         3
...
12         Miami       6
12         Cal          8

我想改变它,让它看起来像

Month     Location      Money
Spring      Miami        sum(from month=1,2,3)
spring      Cal          sum (from month= 1,2,3)
summer...
summer...
fall...
fall...
winter...
winter...

我不知道如何直接问这个问题(合并行,聚合行?)但谷歌搜索它只返回 dplyr::group_by 并根据行的单个值汇总行。我想根据多行值合并/汇总数据。有没有简单的方法?任何帮助将不胜感激谢谢!

标签: rdplyrrows

解决方案


听起来你想

  1. 为每条记录分配季节,
  2. group_by 季节,
  3. 总结。

如果这是您要去的地方,您可以创建一个新列,也可以直接创建。您还可以创建一个单独的表,其中包含月份和季节以及 left_join 到您的数据。

library(dplyr)
## simulate data
df = tibble(
      month = rep(1:12, each = 4),
      location = rep(c("Cal", "Miami"), times = 24),
      money = as.integer(runif(48, 10, 100 ))
)

head(df)
# # A tibble: 6 x 3
# month location money
# <int> <chr>    <int>
# 1     1 Cal         69
# 2     1 Miami       84
# 3     1 Cal         38
# 4     1 Miami       44
# 5     2 Cal         33
# 6     2 Miami       64

## Create season based on month in groups of 3
df %>%
      mutate(season = (month-1) %/% 3 +1) %>%
      group_by(season, location) %>%
      summarize(Monthly_Total = sum(money))
# # A tibble: 8 x 3
# # Groups:   season [4]
# season location Monthly_Total
# <dbl> <chr>            <int>
# 1      1 Cal                360
# 2      1 Miami              265
# 3      2 Cal                392
# 4      2 Miami              380
# 5      3 Cal                348
# 6      3 Miami              278
# 7      4 Cal                358
# 8      4 Miami              411

使用相同的数据,您可以跳过列创建并将其包含在 group_by 中:


df %>%
      group_by(season = (month-1) %/% 3 +1, location) %>%
      summarize(Monthly_Total = sum(money))
## results identical to above.

只创建一个季节表可能更有意义:

seasons = tibble(
      month = 1:12,
      season = rep(c("Spring", "Summer", "Winter", "Fall"), each = 3)
)

df %>%
      left_join(seasons) %>%
      group_by(season, location) %>%
      summarize(Monthly_Total = sum(money))
## again identical to above

后者的优点是更透明。


推荐阅读