首页 > 解决方案 > R根据分组计数结果将产品类型分配给用户

问题描述

数据集是基于在线购买信息的在线市场示例(ebay、amazon)。

user_id, product_code, bought_date, time_spent, store_id, product_type, refurbished, unqiue_visit_id
001, e.12, 20120102, 104, 101, computer, yes, 1010
002, e.24, 20120201, 100, 101, infant-dress, no, 2001
003, s.32, 20130302, 230, 101, shoes, no, 2121
004, y.23, 20130404, 212, 103, computer, yes, 2422
005, s.43, 20130803, 104, 101, laptop, yes, 2342
001, a.12, 20120202, 104, 101, computer, yes, 1011
002, b.24, 20120201, 100, 101, infant-dress, no, 2001
003, c.32, 20130302, 230, 101, shoes, no, 2122
004, e.23, 20130404, 212, 103, computer, yes, 2424
005, f.43, 20130803, 104, 101, laptop, yes, 2340
001, g.12, 20120102, 104, 101, computer, yes, 1013
002, h.24, 20120201, 100, 101, infant-dress, no, 2031
003, l.32, 20130302, 230, 101, shoes, no, 2000
004, m.23, 20130404, 212, 103, computer, yes, 1422
005, d.43, 20130803, 104, 101, laptop, yes, 1142
001, d.12, 20120102, 104, 101, desk, yes, 1110
002, f.24, 20120201, 100, 101, glass, no, 1111
003, n.32, 20130302, 230, 101, liquid, no, 2021
004, t.23, 20130404, 212, 103, liquid, yes, 22
005, u.43, 20130803, 104, 101, dress, yes, 2942
001, d.12, 20120102, 104, 101, desk, yes, 1910
002, f.24, 20120201, 100, 101, glass, no, 2901
003, n.32, 20130302, 230, 101, liquid, no, 2921
004, t.23, 20130404, 212, 103, liquid, yes, 2922
005, u.43, 20130803, 104, 101, dress, yes, 2942
001, kk.12, 20120103, 105, 101, desk, yes, 410
003, n.32, 20130303, 230, 101, liquid, no, 2621

unique_visit_id使用user_id, product_code,store_idproduct_type创建bought_date

目标是首先通过分组user_idproduct_type

test.visits <- test %>% 
  group_by(user_id,product_type) %>% 
  summarize(visit_count = n_distinct(unqiue_visit_id)) %>% 
  arrange(desc(visit_count),user_id)


   user_id product_type    visit_count
     <int> <fct>           <int>
 1       1 " computer"         3
 2       1 " desk"             3
 3       2 " infant-dress"     3
 4       3 " liquid"           3
 5       3 " shoes"            3
 6       4 " computer"         3
 7       5 " laptop"           3
 8       2 " glass"            2
 9       4 " liquid"           2
10       5 " dress"            2

现在我想根据最高访问次数将产品类型分配给用户。如果近因 ( bought_date)的访问量相等refurbish,则 的最后一个值store id

example:
     1       1 " computer"         3
     2       1 " desk"             3

基本条件是抢七。分配给组内用户的最高访问次数 product_type

标签: rdplyr

解决方案


推荐阅读