r - 通过在 R 中使用 dplyr 计算来融化数据框和分组
问题描述
我的样本数据`
structure(list(state = c("AP", "AP"), district = c("krishna",
"guntur"), rate = c(170104.5156, 1343.78134), growth_in_2016 = c(0.3844595,
0.3678), growth_in_2017 = c(0.444595, 0.8445), growth_in_2018 = c(0.323699,
0.36213), growth_in_2019 = c(0.5777, 0.35256), growth_in_2020 = c(0.2669097,
0.9097)), class = c("data.table", "data.frame"), row.names = c(NA,-2L), .internal.selfref = <pointer: 0x00000000026c1ef0>)
`
我正在尝试按州和地区分组,然后计算每年的月增长率。
每月计算公式为:(1+rates*growth_in_year)^(1/12) -1 如有错误请指正
`
state district date rates
AP krishna 2016-12-31 x
AP krishna 2017-01-31 y
AP krishna 2017-02-28 z
AP krishna 2017-03-30 a
AP krishna 2017-04-31 b
AP krishna 2017-05-30 c
AP krishna 2017-06-31 d
其他地区也是如此。每个地区的费率必须每年递增。我想要日期格式而不是年份格式。
解决方案
我们可以先将gather
数据转为长格式,然后group_by
state
,district
和year
,找到新的月rate
,从列名中提取年份,并创建一个list
代表全年月份最后一天的日期,最后计算累积总和rate
以获得增量每个月的价值。
library(dplyr)
library(tidyr)
df %>%
gather(key, value, -(1:3)) %>%
group_by(state, district, key) %>%
mutate(rate = (1 + rate * value)^(1/12) - 1,
year = sub(".*(\\d{4})", "\\1", key),
dates = list(seq(as.Date(paste0(year, "-01-01")),
as.Date(paste0(year, "-12-01")), by = "month")- 1)) %>%
unnest() %>%
mutate(rate = cumsum(rate)) %>%
select(-year)
# state district rate key value dates
# <chr> <chr> <dbl> <chr> <dbl> <date>
# 1 AP krishna 1.52 growth_in_2016 0.384 2015-12-31
# 2 AP krishna 3.04 growth_in_2016 0.384 2016-01-31
# 3 AP krishna 4.56 growth_in_2016 0.384 2016-02-29
# 4 AP krishna 6.08 growth_in_2016 0.384 2016-03-31
# 5 AP krishna 7.60 growth_in_2016 0.384 2016-04-30
# 6 AP krishna 9.12 growth_in_2016 0.384 2016-05-31
# 7 AP krishna 10.6 growth_in_2016 0.384 2016-06-30
# 8 AP krishna 12.2 growth_in_2016 0.384 2016-07-31
# 9 AP krishna 13.7 growth_in_2016 0.384 2016-08-31
#10 AP krishna 15.2 growth_in_2016 0.384 2016-09-30
# … with 110 more rows
数据
df <- structure(list(state = c("AP", "AP"), district = c("krishna",
"guntur"), rate = c(170104.5156, 1343.78134), growth_in_2016 = c(0.3844595,
0.3678), growth_in_2017 = c(0.444595, 0.8445), growth_in_2018 = c(0.323699,
0.36213), growth_in_2019 = c(0.5777, 0.35256), growth_in_2020 = c(0.2669097,
0.9097)), class = c("data.table", "data.frame"), row.names = c(NA, -2L))
推荐阅读
- drupal - “file_save_data”中的 FILE_EXISTS_RENAME 有时不会重命名现有文件名,并且会为“file_managed”条目中的重复条目引发错误
- c++ - 使用 glDrawElements 绘制 std::vector
- python - 用 '#' 后跟数字分割字符串
- presto - Presto中各种内存的解释清楚
- firebase - 谁能建议我如何手动(而不是通过编程)将图像存储到firebase的存储部分?
- python - 检查 Gmail 帐户以跟踪使用 Python 的空间百分比
- wordpress - Wordpress,导航到产品页面时如何修复弹出窗口?
- android - MediaPlayer 突然停止播放
- c# - 数据未通过 SQLDataAdapter 加载到 GridView
- java - 在java中创建另一个类的对象时找不到符号错误而没有仅在windows中扩展