首页 > 解决方案 > 按月份和这些月份的小时数对 pandas 数据框进行分组

问题描述

我正在尝试按月对数据框进行分组,并在该月内按一天中的小时数对数据框进行分组,以获得每个月一天中每个小时的平均值。到目前为止,我已经运行了以下行,但它不起作用:df=df.groupby([pd.Grouper(freq='M'),pd.Grouper(freq='h')]).mean(). 关于如何高效地做到这一点的任何想法?

    date  = ['2015-02-03 23:00:00','2015-02-03 23:30:00','2015-02-04 00:00:00','2015-02-04 00:30:00','2015-02-04 01:00:00','2015-02-04 01:30:00','2015-02-04 02:00:00','2015-02-04 02:30:00','2015-02-04 03:00:00','2015-02-04 03:30:00','2015-02-04 04:00:00','2015-02-04 04:30:00','2015-02-04 05:00:00','2015-02-04 05:30:00','2015-02-04 06:00:00','2015-02-04 06:30:00','2015-02-04 07:00:00','2015-02-04 07:30:00','2015-02-04 08:00:00','2015-02-04 08:30:00','2015-02-04 09:00:00','2015-02-04 09:30:00','2015-02-04 10:00:00','2015-02-04 10:30:00','2015-02-04 11:00:00','2015-02-04 11:30:00','2015-02-04 12:00:00','2015-02-04 12:30:00','2015-02-04 13:00:00','2015-02-04 13:30:00','2015-02-04 14:00:00','2015-02-04 14:30:00','2015-02-04 15:00:00','2015-02-04 15:30:00','2015-02-04 16:00:00','2015-02-04 16:30:00','2015-02-04 17:00:00','2015-02-04 17:30:00','2015-02-04 18:00:00','2015-02-04 18:30:00','2015-02-04 19:00:00','2015-02-04 19:30:00','2015-02-04 20:00:00','2015-02-04 20:30:00','2015-02-04 21:00:00','2015-02-04 21:30:00','2015-02-04 22:00:00','2015-02-04 22:30:00','2015-02-04 23:00:00','2015-02-04 23:30:00']
    value = [33.24  , 31.71  , 34.39  , 34.49  , 34.67  , 34.46  , 34.59  , 34.83  , 35.78  , 33.03  , 35.49  , 33.79  , 36.12  , 37.09  , 39.54  , 41.19  , 45.99  , 50.23  , 46.72  , 47.47  , 48.46  , 48.38  , 48.40  , 48.13  , 38.35  , 38.19  , 38.12  , 38.05  , 38.06  , 37.83  , 37.49  , 37.41 , 41.84  , 42.26 , 44.09  , 48.85  , 50.07 , 50.94  , 51.09  , 50.60  , 47.39  , 45.57  , 45.03  , 44.98  , 41.32  , 40.37  , 41.12  , 39.33  , 35.38  , 33.44  ]
    df = pd.DataFrame({'value':value,'index':date})
    df.index = pd.to_datetime(df['index'],format='%Y-%m-%d %H:%M')
    df.drop(['index'],axis=1,inplace=True)
    print(df)    

                         value
    index                     
    2015-02-03 23:00:00  33.24
    2015-02-03 23:30:00  31.71
    2015-02-04 00:00:00  34.39
    2015-02-04 00:30:00  34.49
    2015-02-04 01:00:00  34.67
    2015-02-04 01:30:00  34.46

标签: pythonpandaspandas-groupby

解决方案


Dataframe.reset_index+DataFrame.groupby与 一起使用Series.dt

df2=df.reset_index()    
df3=df2.groupby([df2['index'].dt.year.rename('year'),df2['index'].dt.month.rename('month'),df2['index'].dt.hour.rename('hour')]).mean()
print(df3)

                   value
year month hour         
2015 2     0     34.4400
           1     34.5650
           2     34.7100
           3     34.4050
           4     34.6400
           5     36.6050
           6     40.3650
           7     48.1100
           8     47.0950
           9     48.4200
           10    48.2650
           11    38.2700
           12    38.0850
           13    37.9450
           14    37.4500
           15    42.0500
           16    46.4700
           17    50.5050
           18    50.8450
           19    46.4800
           20    45.0050
           21    40.8450
           22    40.2250
           23    33.4425
​

如果您不想考虑年份,则在分组时不要包括它们:

df3=df2.groupby([df2['index'].dt.month.rename('month'),df2['index'].dt.hour.rename('hour')]).mean()

              value
month hour         
2     0     34.4400
      1     34.5650
      2     34.7100
      3     34.4050
      4     34.6400
      5     36.6050
      6     40.3650
      7     48.1100
      8     47.0950
      9     48.4200
      10    48.2650
      11    38.2700
      12    38.0850
      13    37.9450
      14    37.4500
      15    42.0500
      16    46.4700
      17    50.5050
      18    50.8450
      19    46.4800
      20    45.0050
      21    40.8450
      22    40.2250
      23    33.4425
​

推荐阅读