python - pandas.core.base.SpecificationError:函数名称必须是唯一的,当使用部分作为聚合函数时
问题描述
重现问题:
import pandas as pd
from functools import partial
def quantile_builder(portion, x):
print(x)
return x.quantile(portion)
q90 = partial(quantile_builder, 0.90)
q95 = partial(quantile_builder, 0.95)
data = [('a', 1), ('a', 1),('b', 1),('a', 3),('b', 2),('c', 1),('a', 2),('b', 3),('a', 2)]
df = pd.DataFrame(data, columns=['project', 'duration'])
df_agg = df.groupby(['project']).agg({'duration': ['median', q90, q95]})
它提出:
Traceback (most recent call last):
File "test_pandas_bug.py", line 23, in <module>
df_agg = df.groupby(['project']).agg({'duration': ['median', q90, q95]})
File "/data/software/miniconda3/lib/python3.7/site-packages/pandas/core/groupby/generic.py", line 1315, in aggregate
return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs)
File "/data/software/miniconda3/lib/python3.7/site-packages/pandas/core/groupby/generic.py", line 186, in aggregate
result, how = self._aggregate(arg, _level=_level, *args, **kwargs)
File "/data/software/miniconda3/lib/python3.7/site-packages/pandas/core/base.py", line 503, in _aggregate
result = _agg(arg, _agg_2dim)
File "/data/software/miniconda3/lib/python3.7/site-packages/pandas/core/base.py", line 449, in _agg
result[fname] = func(fname, agg_how)
File "/data/software/miniconda3/lib/python3.7/site-packages/pandas/core/base.py", line 440, in _agg_2dim
return colg.aggregate(how, _level=None)
File "/data/software/miniconda3/lib/python3.7/site-packages/pandas/core/groupby/generic.py", line 1315, in aggregate
return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs)
File "/data/software/miniconda3/lib/python3.7/site-packages/pandas/core/groupby/generic.py", line 186, in aggregate
result, how = self._aggregate(arg, _level=_level, *args, **kwargs)
File "/data/software/miniconda3/lib/python3.7/site-packages/pandas/core/base.py", line 559, in _aggregate
_axis=_axis), None
File "/data/software/miniconda3/lib/python3.7/site-packages/pandas/core/base.py", line 605, in _aggregate_multiple_funcs
results.append(colg.aggregate(arg))
File "/data/software/miniconda3/lib/python3.7/site-packages/pandas/core/groupby/generic.py", line 766, in aggregate
(_level or 0) + 1)
File "/data/software/miniconda3/lib/python3.7/site-packages/pandas/core/groupby/generic.py", line 832, in _aggregate_multiple_funcs
'{}'.format(name))
pandas.core.base.SpecificationError: Function names must be unique, found multiple named quantile_builder
环境:
Python 3.7
pandas 0.24.2
是熊猫虫吗?暂时有什么解决方法吗?
解决方案
它可以pandas 1.0.1
在我的电脑上运行。但是,另一种解决方法是成对聚合:
df_agg = (df.groupby(['project'])['duration']
.agg([('mean','mean'),('q90',q90),('q95',q95)])
)
输出:
mean q90 q95
project
a 1.8 2.6 2.8
b 2.0 2.8 2.9
c 1.0 1.0 1.0
另一种选择是重写函数的名称:
q90 = partial(quantile_builder, 0.90)
q90.__name__ = 'q90'
q95 = partial(quantile_builder, 0.95)
q95.__name__ = 'q95'
# should work now
df_agg = df.groupby(['project']).agg({'duration': ['median', q90, q95]})
输出:
duration
median q90 q95
project
a 2 2.6 2.8
b 2 2.8 2.9
c 1 1.0 1.0
推荐阅读
- sql-server - PowerShell New-AzSqlDatabaseCopy:如何在复制数据库时显示进度?
- python - 如何使用 groupby 求和将值插入 numpy 数组
- apache-spark - 以 Kafka 为源的结构化流中的 JSON 模式推断
- node.js - 如果我知道字符串中的时区,则将 Date() 对象设置为特定时间
- .net - 如何使用数据绑定在 datepicker 中设置今天的日期?
- react-native - React-native/hermes 构建错误。没有这样的文件或目录:index.android.bundle.packager.map
- java - Java中的FileReader找不到文件
- java - 如何通过单击JSP中的输入按钮在HTML输入文本框中显示java变量值
- vue.js - 无法通过 VueJs 检索 Header 授权
- python - 同一张表Pyspark上的两个不同连接