首页 > 解决方案 > 如何使用 R 在分布中分布一系列卷

问题描述

我正在尝试预测事件的数量(图书馆图书归还)。我有一个特定日期的潜在预期回报量的数据框(来自表格)和先前回报行为的密度函数。我的计划是使用 convolve 函数,但我遇到了困难。任何关于最佳前进方式的想法?

计算借用时间Length

data$borrow_length <- data$due_date - data$return_date

制作 PDF

renewal_pdf <- density(data$borrow_length)
plot(borrow_pdf)

生产卷

return_volume <- as.data.frame(table(data$due_date))

output <- convolve(borrow_pdf, return_volume$Freq, type = "open")

我希望以一张表格结束,其中包含预测的退货日期,同时考虑到所有提前和延迟的退货。

标签: rdistributionmodelingforecasting

解决方案


同意@shea:一个可重现的例子会有所帮助。

这是一个,据我了解:

set.seed(1)
N = 200

# Historical data with known return date
old_data = data.frame( due_date = as.Date("2019-04-01") + floor(runif(N, 0, 30)) )
old_data$return_date = old_data$due_date + round(rnorm(N, 0, 5))

# Currently borrowed books
current_data = data.frame( due_date = as.Date("2019-05-10") + floor(runif(N, 0, 30)) )

如果我理解正确,您想估计return_date. current_data这是一个解决方案,手动计算卷积:这效率不高,但易于理解。

# For semantics, I renamed your borrow_length into borrow_delay
old_data$borrow_delay = old_data$return_date - old_data$due_date

# Compute its distribution (no smoothing)
distr_delay = as.data.frame(prop.table(table(delay = old_data$borrow_delay)), responseName="p_delay")
distr_delay$delay = as.integer(distr_delay$delay)

# Counts by due date
tab_volume = as.data.frame(table(due_date = current_data$due_date))
tab_volume$due_date = as.Date(as.character(tab_volume$due_date))

# Explicit convolution
distr_return = merge(tab_volume, distr_delay)
distr_return$return_date = with(distr_return, due_date + delay)
distr_return$expected_n_returns = with(distr_return, Freq*p_delay)
distr_return = with(distr_return, tapply(expected_n_returns, return_date, sum))
# Reformat
distr_return = data.frame(
  return_date = as.Date(names(distr_return)),
  expected_n_returns = c(distr_return)
)

# Sanity check: sum of expectations is 200 (the number of books borrowed)
sum(distr_return$expected_n_returns)

with(distr_return, plot(return_date, expected_n_returns))

推荐阅读