首页 > 解决方案 > In a Pandas dataframe how do I calculate the median value for each decile within each month

问题描述

I have a dataframe with 50 data points per month. I'd like to calculate the median value for each decile within each month. In my groupby call I lead with the date, then qcut. But qcut calculates the bins over the whole dataset, not by month. Here's what I have so far:

import numpy as np
import pandas as pd

datecol = pd.date_range('12/31/2018','12/31/2019', freq='M')
for ii in range(0,49):
        datecol = datecol.append(pd.date_range('12/31/2018','12/31/2019', freq='M'))
datecol = datecol.sort_values()
df = pd.DataFrame(np.random.randn(len(datecol), 1), index=datecol, columns=['Data'])

dfg = df.groupby([df.index, pd.qcut(df['Data'], 10)])['Data'].median()

I've tried to run a qcut on the monthly grouping, but that hasn't worked.

标签: pythonpandasgroup-by

解决方案


首先,groupbymonth 在月份内创建分位数标签。然后groupby用月份和分位数找到中位数。

df['q'] = df.groupby(df.index).Data.apply(lambda x: pd.qcut(x, 10, labels=False))
df.groupby([df.index, 'q']).median()

                 Data
           q          
2018-12-31 0 -1.592383
           1 -0.959931
           2 -0.662911
           3 -0.421994
           4 -0.098636
           5  0.394583
           6  0.578562
...                ...
2019-12-31 5  0.022384
           6  0.398127
           7  0.562900
           8  0.765605
           9  1.355345

[130 rows x 1 columns]

推荐阅读