首页 > 解决方案 > 如何使用熊猫在 dataFrame 中创建句点?

问题描述

我正在尝试使用 cumsum() 和 cumcount() 创建周期。如何为每本书创建句点?

周期 - 连续的每周序列。例如 If (week) - (previous week) > 1 => new period。

我试过了

df = pd.DataFrame({'book_id': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1,1, 1, 1, 1, 1, 1, 1, 1, 1,1,1,1, 2, 2, 2],
                   'week_id': [199, 200, 201, 202, 203, 204, 205, 206, 208, 209,211, 212, 213,214,216,217,218,219,220,222,223,224,225,226,228]})

df['discount_weeks']=df.groupby(['book_id', df['week_id'].diff().ne(1).cumsum()]).cumcount()+1

df['discount_periods'] = df.groupby(['book_id','discount_weeks']).cumcount()+1

df['total'] = df['discount_weeks'].diff().ne(1).cumsum()
print(df)


    book_id  week_id  discount_weeks  discount_periods  total
0         1      199               1                 1      1
1         1      200               2                 1      1
2         1      201               3                 1      1
3         1      202               4                 1      1
4         1      203               5                 1      1
5         1      204               6                 1      1
6         1      205               7                 1      1
7         1      206               8                 1      1
8         1      208               1                 2      2
9         1      209               2                 2      2
10        1      211               1                 3      3
11        1      212               2                 3      3
12        1      213               3                 2      3
13        1      214               4                 2      3
14        1      216               1                 4      4
15        1      217               2                 4      4
16        1      218               3                 3      4
17        1      219               4                 3      4
18        1      220               5                 2      4
19        1      222               1                 5      5
20        1      223               2                 5      5
21        1      224               3                 4      5
22        2      225               1                 1      6
23        2      226               2                 1      6
24        2      228               1                 2      7

但 'discount_periods' 在某些地方无法正常工作(12、13、16、17、18 ...)

最终结果应该是:


    book_id  week_id  discount_weeks  discount_periods  total
0         1      199               1                 1      1
1         1      200               2                 1      1
2         1      201               3                 1      1
3         1      202               4                 1      1
4         1      203               5                 1      1
5         1      204               6                 1      1
6         1      205               7                 1      1
7         1      206               8                 1      1
8         1      208               1                 2      2
9         1      209               2                 2      2
10        1      211               1                 3      3
11        1      212               2                 3      3
12        1      213               3                 3      3
13        1      214               4                 3      3
14        1      216               1                 4      4
15        1      217               2                 4      4
16        1      218               3                 4      4
17        1      219               4                 4      4
18        1      220               5                 4      4
19        1      222               1                 5      5
20        1      223               2                 5      5
21        1      224               3                 5      5
22        2      225               1                 1      6
23        2      226               2                 1      6
24        2      228               1                 2      7

也许其他方法在这里会比 cumsum()、cumcount() 更好用?

标签: pythonpandas

解决方案


引入一个新列,start_period

df['start_period'] = (df['discount_weeks'].diff().ne(1)).astype(int)

然后,对于每个book_id,您可以定义discount_periods为 的累积和start_period

df['discount_periods'] = df.groupby(['book_id'])['start_period'].cumsum()

import pandas as pd

df = pd.DataFrame({
    'book_id': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1,1, 1, 1, 1, 1, 1, 1, 1, 1,1,1,1, 2, 2, 2],
    'week_id': [199, 200, 201, 202, 203, 204, 205, 206, 208, 209,211, 212, 213,214,216,
                217,218,219,220,222,223,224,225,226,228]})

df['discount_weeks']=df.groupby(['book_id', df['week_id'].diff().ne(1).cumsum()]).cumcount()+1
df['total'] = df['discount_weeks'].diff().ne(1).cumsum()

# df['start_period'] = (df['discount_weeks'] == 1).astype(int)  # also works, if every period starts with discount_weeks equal to 1
df['start_period'] = (df['discount_weeks'].diff().ne(1)).astype(int)
df['discount_periods'] = df.groupby(['book_id'])['start_period'].cumsum()

print(df)

产量

    book_id  week_id  discount_weeks  discount_periods  total  
0         1      199               1                 1      1  
1         1      200               2                 1      1  
2         1      201               3                 1      1  
3         1      202               4                 1      1  
4         1      203               5                 1      1  
5         1      204               6                 1      1  
6         1      205               7                 1      1  
7         1      206               8                 1      1  
8         1      208               1                 2      2  
9         1      209               2                 2      2  
10        1      211               1                 3      3  
11        1      212               2                 3      3  
12        1      213               3                 3      3  
13        1      214               4                 3      3  
14        1      216               1                 4      4  
15        1      217               2                 4      4  
16        1      218               3                 4      4  
17        1      219               4                 4      4  
18        1      220               5                 4      4  
19        1      222               1                 5      5  
20        1      223               2                 5      5  
21        1      224               3                 5      5  
22        2      225               1                 1      6  
23        2      226               2                 1      6  
24        2      228               1                 2      7  

推荐阅读