首页 > 解决方案 > R - 计算不同时间间隔之间的运行总计

问题描述

我有一个跟踪一些贷款余额的数据框。每次向余额(“金额”)付款时,该物业贷款的新余额都会显示在“余额”列中。

df = data.frame(Date = c("2015-03-01", "2015-05-01", "2016-07-02", "2017-11-24", "2017-12-15"),
            Property = c("1 Main St", "1 Main St", "1 Main St", "5 Main St", "1 Main St"),
            Amount = c(50000, -10000, -5000, 75000, -4000),
            Balance = c(50000, 40000, 35000, 75000, 31000)
            )

如您所见,日期相当分散,大多数月份没有任何交易记录。我希望能够在每个月初制作一个包含每个属性余额的数据框,无论当月是否有交易。像这样的东西:

Month = c("March 2015", "April 2015", "May 2015", "June 2015"),
Property = c("1 Main St", "1 Main St", "1 Main St", "1 Main St").
Balance = c(50000, 50000, 40000, 40000)

它还需要能够进行当月的最后一笔交易(如果在给定的月份内有多个房产交易)。任何想法如何处理这个?

标签: r

解决方案


首先,确保您的Date字段是“日期”类型。这是我用来处理您的数据的调用:

df = data.frame(Date = as.Date(c("2015-03-01", "2015-05-01", "2016-07-02", "2017-11-24", "2017-12-15"), "%Y-%m-%d"),
            Property = c("1 Main St", "1 Main St", "1 Main St", "5 Main St", "1 Main St"),
            Amount = c(50000, -10000, -5000, 75000, -4000),
            Balance = c(50000, 40000, 35000, 75000, 31000),
            stringsAsFactors = FALSE
            )

请注意,我还在调用中添加了stringsAsFactors = FALSE参数data.frame

然后,我使用以下代码可能(?)回答您的问题:

library(tidyr)
library(dplyr)
library(lubridate)

arrange(df, Date)

from <- first(df$Date)
to <- last(df$Date)

new_df <- df %>%
        complete(Date = seq.Date(from, to, "day"))%>%
        fill(Property:Balance)%>%
        group_by(year = year(Date), month=month(Date, label = TRUE), Property)%>%
        summarise(Balance = last(Balance))

推荐阅读