首页 > 解决方案 > R dplyr:将剪切和粘贴转换为调用函数

问题描述

我的代码快完成了,我需要 3 年的运行总计。我需要各种分组的运行总数。我需要一个函数。

我可以计算分子和分母。

我需要将分子计算的数据帧连接到分母计算的数据帧。我不知道该怎么做。

所以剩下的唯一步骤是获得一个语法正确的 by子句。

我的部分解决方案如下。

library(purrr)
library(dplyr)
library(tidyr)
library(slider)  # for grouping consecutive years


# The sample data
set.seed(2021)

EVENT_YEAR = 2010:2015
RE = c('white', 'black', 'Asian')
City = c('Oakland', 'San Francisco', 'San Jose')
Note = 1:3

# data frame for numerator
demoDF_N = expand.grid(EVENT_YEAR = EVENT_YEAR, RE = RE, City = City, Note = Note)
demoDF_N$Numerator = sample(3:10, 162, replace = TRUE)
demoDF_N$EVENT_YEAR = as.factor(demoDF_N$EVENT_YEAR)
demoDF_N$RE = as.factor(demoDF_N$RE)
demoDF_N$City = as.factor(demoDF_N$City)
demoDF_N$Note = as.factor(demoDF_N$Note)
demoDF_D$EVENT_YEAR = as.factor(demoDF_D$EVENT_YEAR)

# data frame for denominator
demoDF_D = expand.grid(EVENT_YEAR = EVENT_YEAR, RE = RE, City = City, Note = Note)
demoDF_D$Denominator = sample(90:120, 162, replace = TRUE) 


func1 = function(df1, df2, groups){ 
  result_3_N = df1 %>%    
    group_by(Note, across({{groups}})) %>%               
    arrange(Note, across({{groups}}), EVENT_YEAR) %>%
    mutate(Numerator_UPDATED = slide_dbl(Numerator, sum, .before = 1, .after = 1, 
                                         .complete = TRUE)) %>%
    select(Note, {{groups}}, EVENT_YEAR, Numerator_UPDATED) %>% 
    ungroup()
  result_3_N = result_3_N %>% rename(Numerator = Numerator_UPDATED) # simple rename
  
  result_3_N = result_3_N %>% filter(!is.na(Numerator))  # filter out rows not based on 3 full years
  
  # Get EVENT_YEAR to display range of years, e.g., 2008-2010
  result_3_N$EVENT_YEAR = 
    paste(as.integer(as.character(result_3_N$EVENT_YEAR)) - 1, '-', 
          as.integer(as.character(result_3_N$EVENT_YEAR)) + 1, sep = '')
  
  ###
  result_3_D = df2 %>%    
    group_by(Note, across({{groups}})) %>%               
    arrange(Note, across({{groups}}), EVENT_YEAR) %>%
    mutate(Denominator_UPDATED = slide_dbl(Denominator, sum, .before = 1, .after = 1, 
                                         .complete = TRUE)) %>%
    select(Note, {{groups}}, EVENT_YEAR, Denominator_UPDATED) %>% 
    ungroup()
  result_3_D = result_3_D %>% rename(Denominator = Denominator_UPDATED) # simple rename
  
  result_3_D = result_3_D %>% filter(!is.na(Denominator))  # filter out rows not based on 3 full years
  
  # Get EVENT_YEAR to display range of years, e.g., 2008-2010
  result_3_D$EVENT_YEAR = 
    paste(as.integer(as.character(result_3_D$EVENT_YEAR)) - 1, '-', 
          as.integer(as.character(result_3_D$EVENT_YEAR)) + 1, sep = '')
  
  # Trying to join the results; INCORRECT: by = c(groups))
  # result3 = result_3_N %>% right_join(result_3_D, by = c(groups)) %>%
  #   replace_na(list(Denominator = 0)) %>% replace_na(list(Numerator = 0))
  
  
  result3
}

几个电话

res1 = func1(demoDF_N, demoDF_D, EVENT_YEAR)
res1

res2 = func1(demoDF_N, demoDF_D, c(EVENT_YEAR, RE))
res2    

     

标签: rdplyrfunctional-programming

解决方案


如果你想得到 3 年的运行平均值,那么你需要使用 mean里面的函数slider::slide_dbl()

如果您只想要在 YEAR 上运行的平均值,那么您将不得不对任何其他变量进行分组。请注意,您的数据集应该是完整的,即应该存在每一年(虽然它可以是 NA)。否则,运行平均值可能是错误的。

这是代码:

df1 %>% 
  group_by(RE, City, Note) %>% 
  arrange(EVENT_YEAR) %>% 
  mutate(Numerator_UPDATED = zoo::rollmean(Numerator, 3, fill=NA), 
         Numerator_UPDATED2 = slider::slide_dbl(Numerator, mean, .before = 1, .after = 1, .complete = TRUE)) %>% 
  arrange(Note, City, RE, EVENT_YEAR)

# # A tibble: 162 x 7
# # Groups:   RE, City, Note [27] 
#   EVENT_YEAR RE    City     Note Numerator Numerator_UPDATED Numerator_UPDATED2
#        <int> <fct> <fct>   <int>     <int>             <dbl>              <dbl>
# 1       2010 white Oakland     1         9             NA                 NA   
# 2       2011 white Oakland     1         8              8.33               8.33
# 3       2012 white Oakland     1         8              6.67               6.67
# 4       2013 white Oakland     1         4              7                  7   
# 5       2014 white Oakland     1         9              6.33               6.33
# 6       2015 white Oakland     1         6             NA                 NA   
# 7       2010 black Oakland     1         6             NA                 NA   
# 8       2011 black Oakland     1         8              8                  8   
# 9       2012 black Oakland     1        10              8.67               8.67
# 10      2013 black Oakland     1         8              7.67               7.67

推荐阅读