首页 > 解决方案 > Cumsum 与 groupby

问题描述

我有一个数据框,其中包含:

                State     Country      Date      Cases
0                NaN    Afghanistan 2020-01-22      0
271              NaN    Afghanistan 2020-01-23      0
...              ...            ...        ...    ...
85093            NaN       Zimbabwe 2020-11-30   9950
85364            NaN       Zimbabwe 2020-12-01  10129

我正在尝试创建一个新的累积案例列,但按国家和州分组。

                State     Country      Date      Cases  Total Cases
231          California     USA     2020-01-22      5        5
342          California     USA     2020-01-23     10        15
233            Texas        USA     2020-01-22      4        4
322            Texas        USA     2020-01-23     12        16

我一直在尝试关注Pandas groupby 累积总和 ,并尝试过以下操作:

df['Total'] = df.groupby(['State','Country'])['Cases'].cumsum()

返回一系列 -1

df['Total'] = df.groupby(['State', 'Country']).sum() \
                .groupby(level=0).cumsum().reset_index()

返回总和。

df['Total'] = df.groupby(['Country'])['Cases'].apply(lambda x: x.cumsum())

不按州分开总和。

df_f['Total'] = df_f.groupby(['Region','State'])['Cases'].apply(lambda x: x.cumsum())

当“状态”为 NaN,“Total”也是 NaN 时,此方法有效。

标签: pythonpandasdataframepandas-groupby

解决方案


arrays = [['California', 'California', 'Texas', 'Texas'],
          ['USA', 'USA', 'USA', 'USA'], 
          ['2020-01-22','2020-01-23','2020-01-22','2020-01-23'], [5,10,4,12]]
df = pd.DataFrame(list(zip(*arrays)), columns = ['State', 'Country', 'Date', 'Cases'])
df
    State       Country Date        Cases
0   California  USA     2020-01-22  5
1   California  USA     2020-01-23  10
2   Texas       USA     2020-01-22  4
3   Texas       USA     2020-01-23  12

temp = df.set_index(['State', 'Country','Date'], drop=True).sort_index( )
df['Total Cases'] = temp.groupby(['State', 'Country']).cumsum().reset_index()['Cases']
df
    State       Country Date        Cases   Total Cases
0   California  USA     2020-01-22  5       5
1   California  USA     2020-01-23  10      15
2   Texas       USA     2020-01-22  4       4
3   Texas       USA     2020-01-23  12      16

推荐阅读