首页 > 解决方案 > 为每个分位数添加一个支持列,其中包含 qty 的总和

问题描述

我有一张带有月份和数量的表格:

原始数据

而且我还在一个单独的表格中计算了百分位数:

q = data2.groupby('Month').quantile([0.05, 0.25, 0.5, 0.75, 0.95, 1])

百分表

现在我需要在 percentile 表中添加另一列,它应该显示记录的数量(计数)低于 percentile ,我已经尝试过:

q['Count'] = q['Qty2'].count()

结果显示每一行的计数相同:

结果表

结果表应该喜欢;

预期结果

标签: pythonpandas

解决方案


我认为您需要cut合并列Qty、tehn 聚合总和和size、重塑DataFrame.stackSeries.unstack最后计数新行和列。因为使用MultiIndexccolumns 是由元组选择的:

df = pd.read_excel('sample data.xlsx')

lab = ['<10km','10-25km','25-50km','50-75km','75-100km','>100km']
df['bins'] = (pd.cut(df['Qty'], 
                     bins=[-np.inf, 10,25,50,75,100,np.inf], 
                     labels=lab).astype(str))

# df = df.sort_values('Date')
df = (df.groupby([pd.Grouper(freq='MS', key='Date'), 'bins'], sort=False)
        .agg(Qty=('Qty','sum'), Count=('Qty', 'size'))
        .stack()
        .unstack([1,2])
        )
df = df.set_index(df.index.strftime('%b-%y'))

df[('','Total Qty')] = df.xs('Qty', axis=1, level=1).sum(axis=1)
df[('','Total Count')] = df.xs('Count', axis=1, level=1).sum(axis=1)
df.loc['Grand Total'] = df.sum()
df.loc['% share'] = (df.loc['Grand Total'].div(df.loc['Grand Total',('','Total Count')])
                       .mul(100).round())

df[('','%')] = (df[('','Total Count')].drop(['Grand Total','% share'])
                                      .div(df.loc['Grand Total',('','Total Count')])
                                      .mul(100).round())

print (df)
bins           50-75km          75-100km            >100km           25-50km  \
                   Qty  Count        Qty  Count        Qty  Count        Qty   
Date                                                                           
Jan-20        2252.515   36.0   2931.099   34.0   1963.314   16.0   2365.221   
Feb-20        3201.651   51.0   1640.793   19.0   4085.809   30.0   1370.316   
Mar-20        2098.092   34.0   1401.169   16.0   1539.441   13.0   1266.176   
Apr-20         996.734   16.0    703.785    8.0    450.147    4.0   1054.756   
May-20        1665.223   27.0   1074.167   12.0   1615.029   12.0   2645.278   
Jun-20        3924.892   65.0   2132.259   25.0   2461.037   20.0   5364.342   
Jul-20        3867.246   64.0   3588.282   41.0   3768.105   29.0   4004.760   
Aug-20        3926.835   65.0   2620.992   31.0   3431.889   26.0   3269.309   
Sep-20        2302.843   37.0   2012.938   24.0   4651.756   35.0    773.813   
Grand Total  24236.031  395.0  18105.484  210.0  23966.527  185.0  22113.971   
% share       1327.000   22.0    991.000   11.0   1312.000   10.0   1210.000   

bins                   10-25km           <10km                                 \
             Count         Qty  Count      Qty Count    Total Qty Total Count   
Date                                                                            
Jan-20        64.0   490.34800   29.0   21.014   4.0  10023.51100       183.0   
Feb-20        39.0   693.42200   38.0   11.019   2.0  11003.01000       179.0   
Mar-20        35.0   516.79800   30.0   27.866   8.0   6849.54200       136.0   
Apr-20        30.0   283.63600   16.0   17.933   3.0   3506.99100        77.0   
May-20        75.0   497.96000   27.0   29.593   4.0   7527.25000       157.0   
Jun-20       148.0  1477.66547   81.0   17.297   2.0  15377.49247       341.0   
Jul-20       110.0  1642.40900   94.0   42.065   6.0  16912.86700       344.0   
Aug-20        89.0   776.63000   43.0   77.330  13.0  14102.98500       267.0   
Sep-20        21.0   351.31300   23.0   20.144   3.0  10112.80700       143.0   
Grand Total  611.0  6730.18147  381.0  264.261  45.0  95416.45547      1827.0   
% share       33.0   368.00000   21.0   14.000   2.0   5223.00000       100.0   

bins               
                %  
Date               
Jan-20       10.0  
Feb-20       10.0  
Mar-20        7.0  
Apr-20        4.0  
May-20        9.0  
Jun-20       19.0  
Jul-20       19.0  
Aug-20       15.0  
Sep-20        8.0  
Grand Total   NaN  
% share       NaN  

推荐阅读