r - 在 R 中,要将相同的过程应用于许多子集,我们是分组还是循环?
问题描述
我的任务是对 R 脚本进行一些小的维护,将其从一次处理单个客户/产品/包装升级到多个。我从来没有写过一行R,语法对我来说是可读的,但相当陌生
R 脚本如下所示:
raw<-dbFetch(dbSubmitQuery("
SELECT *
FROM t
WHERE company = 1 and product = 1 and pack = 1
"))
#Data with just req dates and del qtys
df = subset(raw, select = c(DelDate, DelQty) )
#sum del qty by date
df<- aggregate(df["DelQty"], by=df["DelDate"], sum)
#add missing dates
df<- df %>%
mutate(DelDate = as.Date(DelDate)) %>%
complete(DelDate = seq.Date(min(DelDate), max(DelDate), by="day"))
df[is.na(df)] = 0
df$date2<- floor_date( df$DelDate - days(1), "week") + days(1)
df<- ddply(df, .(date2), summarize, DelQty=sum(DelQty))
df$week = format(df$date2, format="%Y-%U")
tsweek_forecast<- ts(df$DelQty, frequency = 52)
h <- 2
fit_Arima_future <- Arima(tsweek_forecast, order=c(1,0,1), seasonal=c(0,1,0))
ARIMA_forecast_future <- forecast(fit_Arima_future, h=h)$mean
# and then some output back to db here..
它为一个客户/产品/包装生成单一预测
如果我要更改脚本头部的查询,使其看起来像:
SELECT *
FROM t
WHERE company in (1,2,3,4,5,6,7,8,9,10) and product in (1,2,3,4) and pack In(1,2)
然后我将如何修改 R 以便它生成“公司、产品、包装、预测”的数据集(数据集中的 80 个项目 - 10 个公司 x 4 个产品 x 2 个包装)
..我会在循环中作为一组重复的操作来执行此操作(我首先确定了不同的客户/产品/包装元组并循环每个元组)还是由某些分组设施完成?
解决方案
您可以创建要应用于每个组并用于group_by
应用的函数。
library(dplyr)
apply_fun <- function(raw) {
df = subset(raw, select = c(DelDate, DelQty) )
#sum del qty by date
df<- aggregate(df["DelQty"], by=df["DelDate"], sum)
#add missing dates
df<- df %>%
mutate(DelDate = as.Date(DelDate)) %>%
complete(DelDate = seq.Date(min(DelDate), max(DelDate), by="day"))
df[is.na(df)] = 0
df$date2<- floor_date( df$DelDate - days(1), "week") + days(1)
df<- ddply(df, .(date2), summarize, DelQty=sum(DelQty))
df$week = format(df$date2, format="%Y-%U")
tsweek_forecast<- ts(df$DelQty, frequency = 52)
h <- 2
fit_Arima_future <- Arima(tsweek_forecast, order=c(1,0,1), seasonal=c(0,1,0))
ARIMA_forecast_future <- forecast(fit_Arima_future, h=h)$mean
ARIMA_forecast_future
}
data <-dbFetch(dbSubmitQuery("
SELECT *
FROM t
WHERE company in (1,2,3,4,5,6,7,8,9,10) and product in (1,2,3,4) and pack in (1,2)
"))
result <- data %>%
group_by(customer, product, packaging) %>%
summarise(predict = apply_fun(cur_data()))
也可以写成by
-
by(data, data[c('customer', 'product', 'packaging')], apply_fun)
推荐阅读
- python - 使用 lambda(zappa)/django-storages 在 wagtail admin 中尝试上传时出现“不支持的图像格式”错误,并带有有效的 png/jpg
- java - 从 jsp 文件发送 Get 请求并获取 json 响应
- css - CSS 在 localhost 站点中正确显示,但在实际网站上不正确
- antlr4 - Antlr4 匹配力优先级
- dataframe - Julia DataFrame 上的多列选择
- swift - os_log 故障消息产生“状态转储活动”
- flutter - Flutter - 渲染基于自定义符号的字体
- three.js - 在 Three.js 中禁用凹凸贴图上的线性过滤时出现网格伪影
- javascript - 按键事件以单击特定按钮
- android - 在android中删除圆角外的自定义对话框颜色