首页 > 解决方案 > Pandas Python 中的组合(超过 2 个独特的)

问题描述

我有一个数据框,其中每一行都有用户的特定活动:

 UserID     Purchased
  A          Laptop
  A          Food
  A          Car
  B          Laptop
  B          Food
  C          Food
  D          Car

现在我想针对每个组合查找已购买产品的所有唯一组合以及唯一用户数。我的数据集有大约 8 种不同的产品,因此手动操作非常耗时。我希望最终结果类似于:

Number of products   Products    Unique count of Users
       1              Food                1
       2              Car                 1
       2            Laptop,Food           1
       3            Car,Laptop,Food       1

标签: pandas

解决方案


# updated sample data
d = {'UserID': {0: 'A', 1: 'A', 2: 'A', 3: 'B', 4: 'B', 5: 'C', 6: 'D', 7: 'C'},
 'Purchased': {0: 'Laptop',
  1: 'Food',
  2: 'Car',
  3: 'Laptop',
  4: 'Food',
  5: 'Food',
  6: 'Car',
  7: 'Laptop'}}

df = pd.DataFrame(d)


# groupby user id and combine the purchases to a tuple
new_df = df.groupby('UserID').agg(tuple)
# list comprehension to sort your grouped purchases
new_df['Purchased'] = [tuple(sorted(x)) for x in new_df['Purchased']]
# groupby purchases and get then count, which is the number of users for each purchases
final_df = new_df.reset_index().groupby('Purchased').agg('count').reset_index()
# get the len of purchased, which is the number of products in the tuple
final_df['num_of_prod'] = final_df['Purchased'].agg(len)
# rename the columns
final_df = final_df.rename(columns={'UserID': 'user_count'})

             Purchased  user_count  num_of_prod
0               (Car,)           1            1
1  (Car, Food, Laptop)           1            3
2       (Food, Laptop)           2            2

推荐阅读