首页 > 解决方案 > pandas 所有指数的平均 bin 值

问题描述

我有一个数据争论的问题,我很困惑。我正在尝试将数据分组到指定的 bin 中并取 cumsum 的平均值:

import pandas as pd
import numpy as np

df = pd.DataFrame( data = {'year':np.arange(1800,2000,1),'var1':np.random.randint(0,20,200),'var2':np.random.randint(0,20,200)})
thresholds = np.arange(0,20,1)
bins = pd.cut(df.var2, thresholds)

grouped = df.groupby(['year', bins]).count()
grouped = grouped.fillna(0)
grouped = grouped.assign(Num_Events = grouped.groupby('var1').var2.cumsum())

grouped = grouped.unstack()

我想取每个 bin(即列) Num_Events的索引中指定的所有日历年的平均值。好像:groupedgrouped['Num_Events'].head()

var2  (0, 1]  (1, 2]  (2, 3]  (3, 4]  ...  (15, 16]  (16, 17]  (17, 18]  (18, 19]
year                                  ...                                        
1800     0.0     0.0     0.0     0.0  ...       0.0       0.0       0.0       0.0
1801     0.0     0.0     0.0     0.0  ...       0.0       0.0       0.0       0.0
1802     0.0     0.0     0.0     0.0  ...       0.0       0.0       0.0       2.0
1803     0.0     0.0     0.0     0.0  ...       0.0       0.0       0.0       0.0
1804     0.0     0.0     0.0     0.0  ...       0.0       0.0       0.0       0.0

我想要的输出我想要的样子

var2  (0, 1]                (1, 2]              (2, 3]              (3, 4]            ...       (15, 16]                (16, 17]                (17, 18]              (18, 19]
year                                  ...                                        
1800   <avg bin [0,1]>     <avg bin [1,2]>     <avg bin [2,3]>     <avg bin [3,4]>  ...       <avg bin [15,16]>       <avg bin [16,17]>       <avg bin [17,8]>      <avg bin [18,19]>

谢谢!

标签: pandas

解决方案


推荐阅读