首页 > 解决方案 > 如何在Python中进行分组处理、累计和上一行-当前行处理?

问题描述

我有以下数据:

df1 = pd.DataFrame({'AIRPORT': ['ORD','ORD','ORD','ORD','ORD','DCA','DCA','DCA','DCA','DCA','DCA','DCA'],
                    'MONTH': [3,4,5,6,7,1,2,3,4,5,6,7],
                    'VOLUME': [200, 500, 600, 900, 400, 44, 55, 66, 77, 88, 99, 77],
                    'MULT': [2, 3, 4, 5, 6, 2, 3, 4, 5, 6, 7, 8]})

它包含两组机场 ORD 和 DCA。

Groupwise 并且仅针对组的非第一行,我想创建三个新变量:

TRY1:这是前几个月的 VOLUME / 前几个月的 MULT
解决方案:

# TRY1
df['MULTIPLY'] = df['VOLUME']/df['MULT']
df['TRY1'] = df.groupby(['AIRPORT'])['MULTIPLY'].shift(1)

TRY2:这是过去 2 个月 VOLUME 和当月 VOLUME 的累积总和,如果 MONTH >= 6 计算出解决方案:

# TRY2
df['TRY2'] = \
        np.where( df['MONTH'] >= 6
                 ,pd.rolling_sum(df['VOLUME'], window=3).fillna(0)
                 ,np.nan)

TRY3:这是前几个月 VOLUME * 当前月份 MULT
想通了解决方案:

# TRY3
df['TRY3'] = df.groupby(['AIRPORT'])['VOLUME'].shift(1) *  df['MULT']

结果数据:

AIRPORT MONTH   VOLUME  MULT     TRY1                  TRY2          TRY3
ORD     3         200   2           
ORD     4         500   3     100.0(200/2)                         600(200*3)
ORD     5         600   4     166.7(500/3)                         2000(500*4)
ORD     6         900   5     150.0(600/4)  2000(500+600+900)       3000(600*5)
ORD     7         400   6     180.0(900/5)  1900(600+900+400)       5400(900*6)
DCA     1         44    2           
DCA     2         55    3      22.0(44/2)                              132
DCA     3         66    4      18.3(55/3)                              220
DCA     4         77    5      16.5(66/4)                              330
DCA     5         88    6      15.4(77/5)                              462
DCA     6         99    7      14.7(88/6)       264 (77+88+99)         616
DCA     7         77    8      14.1(99/7)       264 (88+99+77)         792

任何想法将不胜感激。谢谢。

标签: pythonpandasnumpy

解决方案


df = pd.DataFrame({'AIRPORT': ['ORD','ORD','ORD','ORD','ORD','DCA','DCA','DCA','DCA','DCA','DCA','DCA'],
                    'MONTH': [3,4,5,6,7,1,2,3,4,5,6,7],
                    'VOLUME': [200, 500, 600, 900, 400, 44, 55, 66, 77, 88, 99, 77],
                    'MULT': [2, 3, 4, 5, 6, 2, 3, 4, 5, 6, 7, 8]})

# TRY1
df['MULTIPLY'] = df['VOLUME']/df['MULT']
df['TRY1'] = df.groupby(['AIRPORT'])['MULTIPLY'].shift(1)

# TRY2
df['TRY2'] = \
        np.where( df['MONTH'] >= 6
                 ,pd.rolling_sum(df['VOLUME'], window=3).fillna(0)
                 ,np.nan)
# TRY3
df['TRY3'] = df.groupby(['AIRPORT'])['VOLUME'].shift(1) *  df['MULT']

推荐阅读