首页 > 解决方案 > 如何根据列中的条件创建 N 个组?

问题描述

我需要使用两列创建组。例如,我拿了shop_idweek。这是df:

   shop_id  week
0        1     1
1        1     2
2        1     3
3        2     1
4        2     2
5        3     2
6        1     5

想象一下,每个组都是在每个商店连续(每周)发生的一些促销活动。所以,我的尝试是使用排序,移动 1 来获得 last_week,使用布尔值然后迭代它们,每次递增而条件不满足:

test_df = pd.DataFrame({'shop_id':[1,1,1,2,2,3,1], 'week':[1,2,3,1,2,2,5]})

def createGroups(df, shop_id, week, group):
    '''Create groups where is the same shop_id and consecutive week
    '''
 
    periods = []
    period = 0
    
    # sorting to create chronological order
    df = df.sort_values(by = [shop_id,week],ignore_index = True)
    last_week = df[week].shift(+1)==df[week]-1
    last_shop = df[shop_id].shift(+1)==df[shop_id]
    
#     here i iterate over booleans and increment group by 1 
#     if shop is different or last period is more than 1 week ago
    for p,s in zip(last_week,last_shop):
        if (p == True) and (s == True):
            periods.append(period)
        else:
            period += 1 
            periods.append(period)
    
    df[group] = periods
    
    return df
createGroups(test_df, 'shop_id', 'week', 'promo')

我得到了我需要的分组:

 shop_id  week  promo
0        1     1      1
1        1     2      1
2        1     3      1
3        1     5      2
4        2     1      3
5        2     2      3
6        3     2      4

然而,功能似乎是一种矫枉过正。关于如何在没有使用本机 pandas 函数的 for 循环的情况下获得相同结果的任何想法?我.ngroups()在文档中看到但不知道如何将其应用于我的案例。更好的是以某种方式对其进行矢量化,但我不知道如何实现这一点:(

标签: pythonpandas

解决方案


首先我们要识别促销活动(连续几周),然后使用groupby().ngroup()枚举促销活动:

df = df.sort_values('shop_id')

s = df['week'].diff().ne(1).groupby(df['shop_id']).cumsum()

df['promo'] = df.groupby(['shop_id',s]).ngroup() + 1

更新:这基于您的解决方案:

df = df.sort_values(['shop_id','week'])

s = df[['shop_id', 'week']]
df['promo'] = (s['shop_id'].ne(s['shop_id'].shift()) |
               s['week'].diff().ne(1) ).cumsum()

输出:

   shop_id  week  promo
0        1     1      1
1        1     2      1
2        1     3      1
6        1     5      2
3        2     1      3
4        2     2      3
5        3     2      4

推荐阅读