首页 > 解决方案 > 将日期与 r 中的不同数据框匹配

问题描述

我有两个数据框。第一个数据框看起来像这样。

user_id   date  
1         2016-12-30
2         2016-08-01

第二个数据框看起来像这样

user_id    date         total     type 
1          2016-12-19   100       1
1          2016-11-02   200       2
1          2016-10-18   50        1
1          2016-07-15   100       3
1          2016-01-21   200       1
1          2016-01-18   152       2
2          2016-08-01   30        4
2          2016-01-29   133       2

我正在尝试匹配两个数据框中的 user_id 和日期,然后 >> 选择过去 3 个月 >> 总结总数

我试图将它们分组,但我仍然无法将日期与数据框进行比较。

library(dplyr)
    df %>%
      group_by(user_id, type_cat) %>% 
      summarise(total= sum(total))

library(lubridate). 
 df %>%
  select (user_id,date, total, type) %>%
  filter(date  >= today() - months(3))

我正在寻找的结果是:

user_id total type
1       150   1  
1       200   2  
1       0     3
1       0     4
2       0     1
2       0     2
2       0     3
2       30    4    

标签: rdatetimedplyr

解决方案


df1 = read.table(text = "
user_id   date  
1         2016-12-30
2         2016-08-01
", header=T, stringsAsFactors=F)

df2 = read.table(text = "
user_id    date         total     type 
1          2016-12-19   100       1
1          2016-11-02   200       2
1          2016-10-18   50        1
1          2016-07-15   100       3
1          2016-01-21   200       1
1          2016-01-18   152       2
2          2016-08-01   30        4
2          2016-01-29   133       2
", header=T, stringsAsFactors=F)

library(tidyverse)
library(lubridate)

# update to date columns (if needed; you probably have date columns already)
df1$date = ymd(df1$date)
df2$date = ymd(df2$date)

df1 %>%
  left_join(df2, by="user_id") %>%           # join datasets
  filter(date.y >= date.x - months(3)) %>%   # keep df2 dates within last 3 months from date in df1
  group_by(user_id, type) %>%                # for each user and type
  summarise(total= sum(total)) %>%           # get totals
  ungroup() %>%                              # forget the grouping
  complete(user_id, type=unique(df2$type))   # add missing combinations

返回:

# # A tibble: 8 x 3
#   user_id  type total
#     <int> <int> <int>
# 1       1     1   150
# 2       1     2   200
# 3       1     3    NA
# 4       1     4    NA
# 5       2     1    NA
# 6       2     2    NA
# 7       2     3    NA
# 8       2     4    30

推荐阅读