首页 > 解决方案 > 如何根据来自另一个 data.frame 的信息更新一个 data.frame

问题描述

我有两张桌子:DisplayReview。该Review表包含有关在线商店产品评论的信息。每行代表评论的日期以及截至日期的评论的累积数量和产品的平均评分。

page_id<-c("1072659", "1072659" , "1072659","1072650","1072660","1072660")  
review_id<-c("1761023","1761028","1762361","1918387","1761427","1863914")
date<-as.Date(c("2013-07-11","2013-08-12","2014-07-15","2014-09-10","2013-07-27","2014-08-12"),format = "%Y-%m-%d")
cumulative_No_reviews<-c(1,2,3,1,1,2)
average_rating<-c(5,3.5,4,3,5,5)
Review<-data.frame(page_id,review_id,date,cumulative_No_reviews,average_rating)
page_id        review_id          date    cumulative_No_reviews   average_rating
1072659          1761023        2013-07-11      1                       5
1072659          1761028        2013-08-12      2                       3.5
1072659          1762361        2014-07-15      3                       4
1072650          1918387        2014-09-10      1                       3
1072660          1761427        2013-07-27      1                       5
1072660          1863914        2014-08-12      2                       5

Display表捕获了客户访问产品页面的数据。

page_id<-c("1072659","1072659","1072659","1072650","1072650","1072660","1072660","1072660")
date<-as.Date(c("2013-07-10","2013-08-03","2015-02-11","2014-08-10","2014-09-09","2013-08-12","2014-09-12","2015-08-12"),format = "%Y-%m-%d")
Display<-data.frame(page_id,date)
page_id         date        
1072659     2013-07-10      
1072659     2013-08-03      
1072659     2015-02-11      
1072650     2014-08-10  
1072650     2014-09-09      
1072660     2013-08-12      
1072660     2014-09-12      
1072660     2015-08-12      

我想在表格中添加两列Display(称为它Display2),以反映每个产品访问点的最新评论信息,如下所示:

page_id<-c("1072659","1072659","1072659","1072650","1072650","1072660","1072660","1072660")
date<-as.Date(c("2013-07-10","2013-08-03","2015-02-11","2014-08-10","2014-09-09","2013-08-12","2014-09-12","2015-08-12"),format = "%Y-%m-%d")
cumulative_No_reviews<-c(0,1,3,0,0,1,2,2)
average_rating<-c(NA,5,4,NA,NA,5,5,5)
Display2<-data.frame(page_id,date,cumulative_No_reviews,average_rating)
 page_id            date        cumulative_No_reviews   average_rating
 1072659        2013-07-10                 0                NA
 1072659        2013-08-03                 1                5
 1072659        2015-02-11                 3                4
 1072650        2014-08-10                 0                NA
 1072650        2014-09-09                 0                NA
 1072660        2013-08-14                 1                5
 1072660        2014-09-11                 2                5
 1072660        2015-08-12                 2                5

我会很感激你的帮助。

标签: r

解决方案


您可以通过data.table加入来做到这一点。您可以在s 匹配且日期小于日期的情况下将Review表与表连接起来。对于某些行,将根据这些条件匹配多行,因此我们只选择最后一行。由于按日期排序,这意味着日期最近的那个。Displaypage_idReviewDisplayDisplayReviewmult = 'last'Review

library(data.table) # 1.12.6 for nafill (used below)
setDT(Display)
setDT(Review)

Display2 <- Review[Display, on = .(page_id, date < date), mult = 'last']
Display2
#    page_id review_id       date cumulative_No_reviews average_rating
# 1: 1072659      <NA> 2013-07-10                    NA             NA
# 2: 1072659   1761023 2013-08-03                     1              5
# 3: 1072659   1762361 2015-02-11                     3              4
# 4: 1072650      <NA> 2014-08-10                    NA             NA
# 5: 1072650      <NA> 2014-09-09                    NA             NA
# 6: 1072660   1761427 2013-08-12                     1              5
# 7: 1072660   1863914 2014-09-12                     2              5
# 8: 1072660   1863914 2015-08-12                     2              5

现在这个输出几乎与您在问题中显示的内容相匹配,我们只需要删除列并将列中的 sreview_id替换为NAs 。cumulative_No_reviews0

Display2[, review_id := NULL]
Display2[, cumulative_No_reviews := nafill(cumulative_No_reviews, fill = 0)][]
#    page_id       date cumulative_No_reviews average_rating
# 1: 1072659 2013-07-10                     0             NA
# 2: 1072659 2013-08-03                     1              5
# 3: 1072659 2015-02-11                     3              4
# 4: 1072650 2014-08-10                     0             NA
# 5: 1072650 2014-09-09                     0             NA
# 6: 1072660 2013-08-12                     1              5
# 7: 1072660 2014-09-12                     2              5
# 8: 1072660 2015-08-12                     2              5

推荐阅读