首页 > 解决方案 > 在 R 中,要将相同的过程应用于许多子集,我们是分组还是循环?

问题描述

我的任务是对 R 脚本进行一些小的维护,将其从一次处理单个客户/产品/包装升级到多个。我从来没有写过一行R,语法对我来说是可读的,但相当陌生

R 脚本如下所示:

  
raw<-dbFetch(dbSubmitQuery("
SELECT *
FROM t
WHERE company = 1 and product = 1 and pack = 1
"))
                                                                                                                                           
#Data with just req dates and del qtys
df = subset(raw, select = c(DelDate, DelQty) )

#sum del qty by date
df<- aggregate(df["DelQty"], by=df["DelDate"], sum)

#add missing dates
df<- df %>%
  mutate(DelDate = as.Date(DelDate)) %>%
  complete(DelDate = seq.Date(min(DelDate), max(DelDate), by="day"))

df[is.na(df)] = 0

df$date2<- floor_date( df$DelDate - days(1), "week") + days(1)

df<- ddply(df, .(date2), summarize, DelQty=sum(DelQty))

df$week = format(df$date2, format="%Y-%U")

tsweek_forecast<- ts(df$DelQty, frequency = 52)

h <- 2

fit_Arima_future <- Arima(tsweek_forecast, order=c(1,0,1), seasonal=c(0,1,0))

ARIMA_forecast_future <- forecast(fit_Arima_future, h=h)$mean

# and then some output back to db here..

它为一个客户/产品/包装生成单一预测

如果我要更改脚本头部的查询,使其看起来像:

SELECT *
FROM t
WHERE company in (1,2,3,4,5,6,7,8,9,10) and product in (1,2,3,4) and pack In(1,2)

然后我将如何修改 R 以便它生成“公司、产品、包装、预测”的数据集(数据集中的 80 个项目 - 10 个公司 x 4 个产品 x 2 个包装)

..我会在循环中作为一组重复的操作来执行此操作(我首先确定了不同的客户/产品/包装元组并循环每个元组)还是由某些分组设施完成?

标签: r

解决方案


您可以创建要应用于每个组并用于group_by应用的函数。

library(dplyr)

apply_fun <- function(raw) {
  df = subset(raw, select = c(DelDate, DelQty) )
  #sum del qty by date
  df<- aggregate(df["DelQty"], by=df["DelDate"], sum)
  #add missing dates
  df<- df %>%
    mutate(DelDate = as.Date(DelDate)) %>%
    complete(DelDate = seq.Date(min(DelDate), max(DelDate), by="day"))
  df[is.na(df)] = 0
  df$date2<- floor_date( df$DelDate - days(1), "week") + days(1)
  df<- ddply(df, .(date2), summarize, DelQty=sum(DelQty))
  df$week = format(df$date2, format="%Y-%U")
  tsweek_forecast<- ts(df$DelQty, frequency = 52)
  h <- 2
  fit_Arima_future <- Arima(tsweek_forecast, order=c(1,0,1), seasonal=c(0,1,0))
  ARIMA_forecast_future <- forecast(fit_Arima_future, h=h)$mean
  ARIMA_forecast_future
}

data <-dbFetch(dbSubmitQuery("
SELECT *
FROM t
WHERE company in (1,2,3,4,5,6,7,8,9,10) and product in (1,2,3,4) and pack in (1,2)
"))

result <- data %>% 
            group_by(customer, product, packaging) %>%  
            summarise(predict = apply_fun(cur_data()))

也可以写成by-

by(data, data[c('customer', 'product', 'packaging')], apply_fun)

推荐阅读