python - In a Pandas dataframe how do I calculate the median value for each decile within each month
问题描述
I have a dataframe with 50 data points per month. I'd like to calculate the median value for each decile within each month. In my groupby call I lead with the date, then qcut. But qcut calculates the bins over the whole dataset, not by month. Here's what I have so far:
import numpy as np
import pandas as pd
datecol = pd.date_range('12/31/2018','12/31/2019', freq='M')
for ii in range(0,49):
datecol = datecol.append(pd.date_range('12/31/2018','12/31/2019', freq='M'))
datecol = datecol.sort_values()
df = pd.DataFrame(np.random.randn(len(datecol), 1), index=datecol, columns=['Data'])
dfg = df.groupby([df.index, pd.qcut(df['Data'], 10)])['Data'].median()
I've tried to run a qcut on the monthly grouping, but that hasn't worked.
解决方案
首先,groupby
month 在月份内创建分位数标签。然后groupby
用月份和分位数找到中位数。
df['q'] = df.groupby(df.index).Data.apply(lambda x: pd.qcut(x, 10, labels=False))
df.groupby([df.index, 'q']).median()
Data
q
2018-12-31 0 -1.592383
1 -0.959931
2 -0.662911
3 -0.421994
4 -0.098636
5 0.394583
6 0.578562
... ...
2019-12-31 5 0.022384
6 0.398127
7 0.562900
8 0.765605
9 1.355345
[130 rows x 1 columns]
推荐阅读
- javascript - Google Maps API - 自定义圆圈 onclick 事件未触发
- arrays - 如何确保json保持数组
- java - 导入冲突,导入名称“Toast”不明确
- git - 大型团队如何在软件行业中使用分支?
- ios - 如何使 UITextView 填满剩余空间
- android - 应用程序未显示存储在本地的阿拉伯语言
- typescript - 如何在类型注释中使用函数返回的类?
- django-models - 在 ListView 中编辑模型
- here-api - 如何组合 JS API URL 调用?
- azure - 是否可以使用 Azure 自动化 Runbook 删除另一个 Runbook 输出(Azure 文件共享快照)?