r - R dplyr:将剪切和粘贴转换为调用函数
问题描述
我的代码快完成了,我需要 3 年的运行总计。我需要各种分组的运行总数。我需要一个函数。
我可以计算分子和分母。
我需要将分子计算的数据帧连接到分母计算的数据帧。我不知道该怎么做。
所以剩下的唯一步骤是获得一个语法正确的 by子句。
我的部分解决方案如下。
library(purrr)
library(dplyr)
library(tidyr)
library(slider) # for grouping consecutive years
# The sample data
set.seed(2021)
EVENT_YEAR = 2010:2015
RE = c('white', 'black', 'Asian')
City = c('Oakland', 'San Francisco', 'San Jose')
Note = 1:3
# data frame for numerator
demoDF_N = expand.grid(EVENT_YEAR = EVENT_YEAR, RE = RE, City = City, Note = Note)
demoDF_N$Numerator = sample(3:10, 162, replace = TRUE)
demoDF_N$EVENT_YEAR = as.factor(demoDF_N$EVENT_YEAR)
demoDF_N$RE = as.factor(demoDF_N$RE)
demoDF_N$City = as.factor(demoDF_N$City)
demoDF_N$Note = as.factor(demoDF_N$Note)
demoDF_D$EVENT_YEAR = as.factor(demoDF_D$EVENT_YEAR)
# data frame for denominator
demoDF_D = expand.grid(EVENT_YEAR = EVENT_YEAR, RE = RE, City = City, Note = Note)
demoDF_D$Denominator = sample(90:120, 162, replace = TRUE)
func1 = function(df1, df2, groups){
result_3_N = df1 %>%
group_by(Note, across({{groups}})) %>%
arrange(Note, across({{groups}}), EVENT_YEAR) %>%
mutate(Numerator_UPDATED = slide_dbl(Numerator, sum, .before = 1, .after = 1,
.complete = TRUE)) %>%
select(Note, {{groups}}, EVENT_YEAR, Numerator_UPDATED) %>%
ungroup()
result_3_N = result_3_N %>% rename(Numerator = Numerator_UPDATED) # simple rename
result_3_N = result_3_N %>% filter(!is.na(Numerator)) # filter out rows not based on 3 full years
# Get EVENT_YEAR to display range of years, e.g., 2008-2010
result_3_N$EVENT_YEAR =
paste(as.integer(as.character(result_3_N$EVENT_YEAR)) - 1, '-',
as.integer(as.character(result_3_N$EVENT_YEAR)) + 1, sep = '')
###
result_3_D = df2 %>%
group_by(Note, across({{groups}})) %>%
arrange(Note, across({{groups}}), EVENT_YEAR) %>%
mutate(Denominator_UPDATED = slide_dbl(Denominator, sum, .before = 1, .after = 1,
.complete = TRUE)) %>%
select(Note, {{groups}}, EVENT_YEAR, Denominator_UPDATED) %>%
ungroup()
result_3_D = result_3_D %>% rename(Denominator = Denominator_UPDATED) # simple rename
result_3_D = result_3_D %>% filter(!is.na(Denominator)) # filter out rows not based on 3 full years
# Get EVENT_YEAR to display range of years, e.g., 2008-2010
result_3_D$EVENT_YEAR =
paste(as.integer(as.character(result_3_D$EVENT_YEAR)) - 1, '-',
as.integer(as.character(result_3_D$EVENT_YEAR)) + 1, sep = '')
# Trying to join the results; INCORRECT: by = c(groups))
# result3 = result_3_N %>% right_join(result_3_D, by = c(groups)) %>%
# replace_na(list(Denominator = 0)) %>% replace_na(list(Numerator = 0))
result3
}
几个电话
res1 = func1(demoDF_N, demoDF_D, EVENT_YEAR)
res1
res2 = func1(demoDF_N, demoDF_D, c(EVENT_YEAR, RE))
res2
解决方案
如果你想得到 3 年的运行平均值,那么你需要使用 mean
里面的函数slider::slide_dbl()
。
如果您只想要在 YEAR 上运行的平均值,那么您将不得不对任何其他变量进行分组。请注意,您的数据集应该是完整的,即应该存在每一年(虽然它可以是 NA)。否则,运行平均值可能是错误的。
这是代码:
df1 %>%
group_by(RE, City, Note) %>%
arrange(EVENT_YEAR) %>%
mutate(Numerator_UPDATED = zoo::rollmean(Numerator, 3, fill=NA),
Numerator_UPDATED2 = slider::slide_dbl(Numerator, mean, .before = 1, .after = 1, .complete = TRUE)) %>%
arrange(Note, City, RE, EVENT_YEAR)
# # A tibble: 162 x 7
# # Groups: RE, City, Note [27]
# EVENT_YEAR RE City Note Numerator Numerator_UPDATED Numerator_UPDATED2
# <int> <fct> <fct> <int> <int> <dbl> <dbl>
# 1 2010 white Oakland 1 9 NA NA
# 2 2011 white Oakland 1 8 8.33 8.33
# 3 2012 white Oakland 1 8 6.67 6.67
# 4 2013 white Oakland 1 4 7 7
# 5 2014 white Oakland 1 9 6.33 6.33
# 6 2015 white Oakland 1 6 NA NA
# 7 2010 black Oakland 1 6 NA NA
# 8 2011 black Oakland 1 8 8 8
# 9 2012 black Oakland 1 10 8.67 8.67
# 10 2013 black Oakland 1 8 7.67 7.67
推荐阅读
- c++ - 在实例化类模板时使用内部类模板时出现编译器错误
- r - 添加一个箭头(位于绘图之外),它指向 x 轴并且是 R 中 ggarrange 绘图的一部分
- angular - Angular AOT 产生 NG8001、NG8002 和 NG8003 错误
- java - 没有代码重复的 Java Entity 和 BusinessObject
- php - 为什么我在不使用 get 方法时不断收到“The GET method...”?
- amazon-web-services - 为什么 terraform 试图重建已经提供的基础设施?
- javascript - 按图标从顶部滑入下拉移动菜单
- python - 问题是球循环一直在循环,你无法控制角色,因为循环一直在运行
- ej2-syncfusion - 如何在电子 js 应用程序中添加 Syncfusion EJ2 网格控件
- c# - 不和谐的SDK错误ue4