首页 > 解决方案 > 仅连续日期的累计金额

问题描述

我试图Amount仅当列中的日期连续时才对列应用累积和Date

当前输入:

df = pd.DataFrame({'Country': {0: 'USA',1: 'Canada', 2: 'China',3: 'Egypt',4: 'Poland',5: 'UK',6: 'Jordan'},
                   'Date': {0: '2021-01-01',1: '2021-01-02',2: '2021-01-03',3: '2021-01-04',4: '2021-01-06',5: '2021-01-07',6: '2021-01-08'},
                   'Amount': {0: 10, 1: 15, 2: 10, 3: 20, 4: 25, 5: 30, 6: 10}})

    Country  Date        Amount
0   USA      2021-01-01  10
1   Canada   2021-01-02  15
2   China    2021-01-03  10
3   Egypt    2021-01-04  20
4   Poland   2021-01-06  25
5   UK       2021-01-07  30
6   Jordan   2021-01-08  10

预期输出:

在第 4 行中,由于列中缺少 2021-01-05,累计和被重置Date

    Country  Date        Amount Cumilative
0   USA      2021-01-01  10     10
1   Canada   2021-01-02  15     25
2   China    2021-01-03  10     35
3   Egypt    2021-01-04  20     55
4   Poland   2021-01-06  25     25<
5   UK       2021-01-07  30     55
6   Jordan   2021-01-08  10     65

我试过的,这是不正确的:

我不确定如何在我的脚本中包含检查以检查Date列是否连续以重置列中的累积总和Amount

df['Date'] = pd.to_datetime(df['Date'])
df['Cumilative'] = df['Amount'].cumsum()


    Country Date        Amount  Cumilative
0   USA     2021-01-01  10      10
1   Canada  2021-01-02  15      25
2   China   2021-01-03  10      35
3   Egypt   2021-01-04  20      55
4   Poland  2021-01-06  25      80
5   UK      2021-01-07  30      110
6   Jordan  2021-01-08  10      120

如果有人可以在这里帮助我,将不胜感激。

标签: pythonpandasdataframepandas-groupbycumsum

解决方案


尝试使用withgroupby和:diffcumsum

df['Cumilative'] = df.groupby(df['Date'].diff().dt.days.ne(1).cumsum())['Amount'].cumsum()

现在:

print(df)

输出:

  Country       Date  Amount  Cumilative
0     USA 2021-01-01      10          10
1  Canada 2021-01-02      15          25
2   China 2021-01-03      10          35
3   Egypt 2021-01-04      20          55
4  Poland 2021-01-06      25          25
5      UK 2021-01-07      30          55
6  Jordan 2021-01-08      10          65

这会将日期分组为连续日期组,并应用于所有组cumsum中的Amount列。


推荐阅读