python - Pandas - 多指标均值

问题描述

我对熊猫还很陌生，我正在努力在多索引系列中获得适当的平均值。多指标系列目前是这样的；

idx = pd.MultiIndex.from_tuples([('foo', 111), ('foo', 222),
                                 ('bar', 111), ('bar', 222), ('bar', 333),
                                 ('baz', 111),
                                 ('qux', 111), ('qux', 222)],
                                names=['ID', 'Account Number'])

df = pd.DataFrame(index=idx, data={'Service 1': 18, 'Service 2': 22, 'Total cost': 40})
df = pd.concat([df], keys=['Cost'], axis=1)

                        Cost                     
                   Service 1 Service 2 Total cost
ID  Account Number                               
foo 111                   18        22         40
    222                   18        22         40
bar 111                   18        22         40
    222                   18        22         40
    333                   18        22         40
baz 111                   18        22         40
qux 111                   18        22         40
    222                   18        22         40

从中提取所有数据的表在帐号级别将成本应用于服务 1 和 2，但它真正需要做的是在 ID 级别应用成本并将成本拆分到帐号，所以它应该是什么样子是;

                        Cost                      
                   Service 1  Service 2 Total cost
ID  Account Number                                
foo 111                  9.0  11.000000  20.000000
    222                  9.0  11.000000  20.000000
bar 111                  6.0   7.333333  13.333333
    222                  6.0   7.333333  13.333333
    333                  6.0   7.333333  13.333333
baz 111                 18.0  22.000000  40.000000
qux 111                  9.0  11.000000  20.000000
    222                  9.0  11.000000  20.000000

我已经尝试过df.groupby(['ID']).transform('mean')，但这显然给了我原始数据，我不确定如何到达我需要的地方。

感觉就像我已经解决了这个问题，所以任何帮助将不胜感激。

标签： pythonpandasgroup-bypandas-groupbymulti-index

感谢@ALollz 的编辑。如果有一个多索引，拥有完整的 Dataframe 构造函数代码总是有帮助的

您可以在第一级进行 groupby 并转换 count ，然后除以：

df.div(df.groupby(level=0).transform('count'))

                        Cost                      
                   Service 1  Service 2 Total cost
ID  Account Number                                
foo 111                  9.0  11.000000  20.000000
    222                  9.0  11.000000  20.000000
bar 111                  6.0   7.333333  13.333333
    222                  6.0   7.333333  13.333333
    333                  6.0   7.333333  13.333333
baz 111                 18.0  22.000000  40.000000
qux 111                  9.0  11.000000  20.000000
    222                  9.0  11.000000  20.000000

python - Pandas - 多指标均值

问题描述

解决方案

推荐阅读