首页 > 解决方案 > pandas.core.base.SpecificationError:函数名称必须是唯一的,当使用部分作为聚合函数时

问题描述

重现问题:

import pandas as pd
from functools import partial


def quantile_builder(portion, x):
    print(x)
    return x.quantile(portion)

q90 = partial(quantile_builder, 0.90)
q95 = partial(quantile_builder, 0.95)


data = [('a', 1), ('a', 1),('b', 1),('a', 3),('b', 2),('c', 1),('a', 2),('b', 3),('a', 2)]
df = pd.DataFrame(data, columns=['project', 'duration'])


df_agg = df.groupby(['project']).agg({'duration': ['median', q90, q95]})

它提出:

Traceback (most recent call last):
  File "test_pandas_bug.py", line 23, in <module>
    df_agg = df.groupby(['project']).agg({'duration': ['median', q90, q95]})
  File "/data/software/miniconda3/lib/python3.7/site-packages/pandas/core/groupby/generic.py", line 1315, in aggregate
    return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs)
  File "/data/software/miniconda3/lib/python3.7/site-packages/pandas/core/groupby/generic.py", line 186, in aggregate
    result, how = self._aggregate(arg, _level=_level, *args, **kwargs)
  File "/data/software/miniconda3/lib/python3.7/site-packages/pandas/core/base.py", line 503, in _aggregate
    result = _agg(arg, _agg_2dim)
  File "/data/software/miniconda3/lib/python3.7/site-packages/pandas/core/base.py", line 449, in _agg
    result[fname] = func(fname, agg_how)
  File "/data/software/miniconda3/lib/python3.7/site-packages/pandas/core/base.py", line 440, in _agg_2dim
    return colg.aggregate(how, _level=None)
  File "/data/software/miniconda3/lib/python3.7/site-packages/pandas/core/groupby/generic.py", line 1315, in aggregate
    return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs)
  File "/data/software/miniconda3/lib/python3.7/site-packages/pandas/core/groupby/generic.py", line 186, in aggregate
    result, how = self._aggregate(arg, _level=_level, *args, **kwargs)
  File "/data/software/miniconda3/lib/python3.7/site-packages/pandas/core/base.py", line 559, in _aggregate
    _axis=_axis), None
  File "/data/software/miniconda3/lib/python3.7/site-packages/pandas/core/base.py", line 605, in _aggregate_multiple_funcs
    results.append(colg.aggregate(arg))
  File "/data/software/miniconda3/lib/python3.7/site-packages/pandas/core/groupby/generic.py", line 766, in aggregate
    (_level or 0) + 1)
  File "/data/software/miniconda3/lib/python3.7/site-packages/pandas/core/groupby/generic.py", line 832, in _aggregate_multiple_funcs
    '{}'.format(name))
pandas.core.base.SpecificationError: Function names must be unique, found multiple named quantile_builder

环境:

Python 3.7
pandas 0.24.2

是熊猫虫吗?暂时有什么解决方法吗?

标签: pythonpandas

解决方案


它可以pandas 1.0.1在我的电脑上运行。但是,另一种解决方法是成对聚合:

df_agg = (df.groupby(['project'])['duration']
            .agg([('mean','mean'),('q90',q90),('q95',q95)])
         )

输出:

             mean  q90  q95
project                    
a             1.8  2.6  2.8
b             2.0  2.8  2.9
c             1.0  1.0  1.0

另一种选择是重写函数的名称:

q90 = partial(quantile_builder, 0.90)
q90.__name__ = 'q90'
q95 = partial(quantile_builder, 0.95)
q95.__name__ = 'q95'

# should work now
df_agg = df.groupby(['project']).agg({'duration': ['median', q90, q95]})

输出:

        duration          
          median  q90  q95
project                   
a              2  2.6  2.8
b              2  2.8  2.9
c              1  1.0  1.0

推荐阅读