r - 使用 runner 包总结组
问题描述
我有一张房价和销售日期表。我想使用 runner 包计算 365 天时间窗口内的滚动中位数价格。我只想要每个日期一个中间价。
我的问题是当我尝试下面的代码时,如果某个日期出现不止一次,我会得到一个以上的中位数价格。这不是我预期会发生的。如果我使用 group_by/summarise,我认为每天都会有一个结果。
library(runner)
library(tidyverse)
library(lubridate)
startDate = as_date("2018-01-01")
endDate = as_date("2020-01-01")
# Create data
soldData <- tibble(
price = round(rnorm(100, mean=500000, sd=100000),-3),
date = sample(seq.Date(startDate,endDate,by="days"),100,replace=T))
# Fill in the missing dates between startDate and endDate
soldData <- bind_rows(soldData,anti_join(tibble(date=seq.Date(startDate,endDate,by="day")),soldData)) %>%
arrange(date)
# Find the duplicated dates
duplicatedDates <- soldData[duplicated(soldData$date),]$date
# I thought using group_by/summarise would return one medianPrice per date
results <- soldData %>%
group_by(date) %>%
summarise(medianPrice = runner(
price,
k = "365 days",
idx = date,
f = function(x) {median(x,na.rm=T)}))
# These are the problem rows.
duplicatedResults <- results %>%
filter(date %in% duplicatedDates)
知道我哪里出错了吗?
解决方案
From dplyr
1.0.0, you can have output that returns multiple rows from summarise
.
First you need to deal with duplicate data which you already have in your data. What do you want to do of dates that have multiple occurrence? One way would be to take median
/mean
of them.
library(dplyr)
library(runner)
soldData %>%
group_by(date) %>%
summarise(price = median(price, na.rm = TRUE)) -> df
So now in df
we only have one value for each date
. You can now apply the runner
function.
df %>%
mutate(medianPrice = runner(price,
k = "365 days",
idx = date,
f = function(x) {median(x,na.rm=T)}))
There is also zoo:rollmedianr
which helps in calculating rolling median.
推荐阅读
- c# - C# Linq-to-Entities OrderByDescending isn't working
- proxy - 查尔斯代理冻结
- ios - 有没有办法捏和缩放以编程方式添加的图像?
- reactjs - 尝试导入错误:“react-lifecycles-compat”不包含默认导出(导入为“reactLifecyclesCompat”)
- javascript - 在模态/弹出/灯箱中从 iframe 发布和接收时未捕获的 TypeError
- linux - Segmentation fault when using memory with custom ELF file
- sql - 在 Postgres 中获取文本数组列的常用值计数
- python - 如何使用 Python 读取大数据 JSON?
- python - Python:尝试抓取 youtube 时出错
- html - 如何在悬停时将文本向左移动?