首页 > 解决方案 > 如何按日期对客户订单进行分组

问题描述

我有一个包含客户订单数据的数据框,如下所示:

cust    order number     order_date      total   product_id
 1      1235846868       2020-01-27       20.0   Product A
 1      1235846869       2020-01-27       14.0   Product B
 2      1245485221       2020-05-16       11.1   Product B, Product C, Product D
 3      1285784226       2020-07-10       24.0   Product D
 4      5412151256       2020-03-27       12.0   Product A
 4      5412151290       2020-04-13       23.0   Product C, Product B
 5      5481581554       2020-02-18       12.0   Product D

正如您在上面看到的,有客户(例如客户“1”)在同一天多次订购(可能是因为他们在第一次订购时忘记将东西放入他们的晒太阳)。我想在同一天汇总客户的这些多个订单,但保留我在数据集中拥有的所有其他列(例如 product_id、order_number 等)。输出表应如下所示:

cust    order number                order_date      total   product_id
 1      1235846868, 1235846869      2020-01-27       34.0   Product A, Product B
 2      1245485221                  2020-05-16       11.1   Product B, Product C, Product D
 3      1285784226                  2020-07-10       24.0   Product D
 4      5412151256                  2020-03-27       12.0   Product A
 4      5412151290                  2020-04-13       23.0   Product C, Product B
 5      5481581554                  2020-02-18       12.0   Product D

谢谢!

标签: rgroup-byaggregate

解决方案


使用dplyr一种方法是到sum列并为和 列total创建一个逗号分隔的字符串。order_numberproduct_id

library(dplyr)

df %>%
  group_by(cust, order_date) %>%
  summarise(total = sum(total, na.rm = TRUE), 
            across(c(order_number, product_id), toString))

#  cust order_date total order_number           product_id                
#  <int> <chr>      <dbl> <chr>                  <chr>                     
#1     1 2020-01-27  34   1235846868, 1235846869 ProductA, ProductB        
#2     2 2020-05-16  11.1 1245485221             ProductB,ProductC,ProductD
#3     3 2020-07-10  24   1285784226             ProductD                  
#4     4 2020-03-27  12   5412151256             ProductA                  
#5     4 2020-04-13  23   5412151290             ProductC,ProductB         
#6     5 2020-02-18  12   5481581554             ProductD       

推荐阅读