首页 > 解决方案 > pandas resampler agg 不能应用于函数列表

问题描述

pandas.Resampler.agg当有要应用的函数列表时,我遇到了问题r.apply({"price" : vwap, "qty": sum_qty, "quoteQty": sum_quoteQty})。它总是返回一个错误,如AttributeError: 'Series' object has no attribute 'price'. 但它只适用于一个功能r.apply(vwap)

我的 dataFrame 具有以下属性priceqty

我的数据类型

我定义了要在 Resampler 上应用的函数列表。我添加了一些打印来调试:

功能

如果我使用函数列表,则重采样器无法找到price我的数据帧的属性:

r.agg(列表)

AttributeError                            Traceback (most recent call last)
/tmp/ipykernel_939/4117684543.py in <module>
----> 1 r.apply({"price" : vwap, "qty": sum_qty, "quoteQty": sum_quoteQty})

/SSD/lime/conda/lib/python3.9/site-packages/pandas/core/resample.py in aggregate(self, func, *args, **kwargs)
    332     def aggregate(self, func, *args, **kwargs):
    333 
--> 334         result = ResamplerWindowApply(self, func, args=args, kwargs=kwargs).agg()
    335         if result is None:
    336             how = func

/SSD/lime/conda/lib/python3.9/site-packages/pandas/core/apply.py in agg(self)
    159 
    160         if is_dict_like(arg):
--> 161             return self.agg_dict_like()
    162         elif is_list_like(arg):
    163             # we require a list, but not a 'str'

/SSD/lime/conda/lib/python3.9/site-packages/pandas/core/apply.py in agg_dict_like(self)
    433         else:
    434             # key used for column selection and output
--> 435             results = {
    436                 key: obj._gotitem(key, ndim=1).agg(how) for key, how in arg.items()
    437             }

/SSD/lime/conda/lib/python3.9/site-packages/pandas/core/apply.py in <dictcomp>(.0)
    434             # key used for column selection and output
    435             results = {
--> 436                 key: obj._gotitem(key, ndim=1).agg(how) for key, how in arg.items()
    437             }
    438 

/SSD/lime/conda/lib/python3.9/site-packages/pandas/core/groupby/generic.py in aggregate(self, func, engine, engine_kwargs, *args, **kwargs)
    263 
    264             try:
--> 265                 return self._python_agg_general(func, *args, **kwargs)
    266             except KeyError:
    267                 # TODO: KeyError is raised in _python_agg_general,

/SSD/lime/conda/lib/python3.9/site-packages/pandas/core/groupby/groupby.py in _python_agg_general(self, func, *args, **kwargs)
   1308             try:
   1309                 # if this function is invalid for this dtype, we will ignore it.
-> 1310                 result = self.grouper.agg_series(obj, f)
   1311             except TypeError:
   1312                 warnings.warn(

/SSD/lime/conda/lib/python3.9/site-packages/pandas/core/groupby/ops.py in agg_series(self, obj, func, preserve_dtype)
   1026 
   1027         else:
-> 1028             result = self._aggregate_series_fast(obj, func)
   1029 
   1030         npvalues = lib.maybe_convert_objects(result, try_float=False)

/SSD/lime/conda/lib/python3.9/site-packages/pandas/core/groupby/ops.py in _aggregate_series_fast(self, obj, func)
   1249         #  - len(self.bins) > 0
   1250         sbg = libreduction.SeriesBinGrouper(obj, func, self.bins)
-> 1251         result, _ = sbg.get_result()
   1252         return result
   1253 

/SSD/lime/conda/lib/python3.9/site-packages/pandas/_libs/reduction.pyx in pandas._libs.reduction.SeriesBinGrouper.get_result()

/SSD/lime/conda/lib/python3.9/site-packages/pandas/_libs/reduction.pyx in pandas._libs.reduction._BaseGrouper._apply_to_group()

/SSD/lime/conda/lib/python3.9/site-packages/pandas/core/groupby/groupby.py in <lambda>(x)
   1294     def _python_agg_general(self, func, *args, **kwargs):
   1295         func = com.is_builtin_func(func)
-> 1296         f = lambda x: func(x, *args, **kwargs)
   1297 
   1298         # iterate through "columns" ex exclusions to populate output dict

/tmp/ipykernel_939/2003501728.py in vwap(x)
      2     print("it's vwap")
      3     print(x)
----> 4     p = x.price
      5     print("it's p")
      6     print(p)

/SSD/lime/conda/lib/python3.9/site-packages/pandas/core/generic.py in __getattr__(self, name)
   5485         ):
   5486             return self[name]
-> 5487         return object.__getattribute__(self, name)
   5488 
   5489     def __setattr__(self, name: str, value) -> None:

AttributeError: 'Series' object has no attribute 'price'

但是,它可以与一个单一功能一起正常工作r.apply(vwap),并且可以检索属性priceqty

r.agg(f)

调试打印验证了我的假设:

2021-09-01 00:00:00.023    1391646824
2021-09-01 00:00:00.093    1391646825
2021-09-01 00:00:00.994    1391646826
2021-09-01 00:00:00.994    1391646827
2021-09-01 00:00:00.994    1391646828
2021-09-01 00:00:00.994    1391646829
Name: trade Id, dtype: int64
it's vwap
                           trade Id     price    qty  quoteQty  isBuyerMaker
time                                                                        
2021-09-01 00:00:00.023  1391646824  47150.32  0.002     94.30          True
2021-09-01 00:00:00.093  1391646825  47150.33  0.002     94.30         False
2021-09-01 00:00:00.994  1391646826  47150.33  0.021    990.15         False
2021-09-01 00:00:00.994  1391646827  47150.33  0.021    990.15         False
2021-09-01 00:00:00.994  1391646828  47152.97  0.002     94.30         False
2021-09-01 00:00:00.994  1391646829  47153.48  0.006    282.92         False
it's p
time
2021-09-01 00:00:00.023    47150.32
2021-09-01 00:00:00.093    47150.33
2021-09-01 00:00:00.994    47150.33
2021-09-01 00:00:00.994    47150.33
2021-09-01 00:00:00.994    47152.97
2021-09-01 00:00:00.994    47153.48
Name: price, dtype: float64
it's q
time
2021-09-01 00:00:00.023    0.002
2021-09-01 00:00:00.093    0.002
2021-09-01 00:00:00.994    0.021
2021-09-01 00:00:00.994    0.021
2021-09-01 00:00:00.994    0.002
2021-09-01 00:00:00.994    0.006
Name: qty, dtype: float64
it's vwap
Empty DataFrame
Columns: [trade Id, price, qty, quoteQty, isBuyerMaker]
Index: []
it's p
Series([], Name: price, dtype: float64)
it's q
Series([], Name: qty, dtype: float64)
it's vwap
                           trade Id     price    qty  quoteQty  isBuyerMaker
time                                                                        
2021-09-01 00:00:02.050  1391646830  47153.47  0.006    282.92          True
2021-09-01 00:00:02.889  1391646831  47153.47  0.054   2546.28          True
2021-09-01 00:00:02.889  1391646832  47153.47  0.050   2357.67          True
2021-09-01 00:00:02.889  1391646833  47153.47  0.050   2357.67          True
it's p
time
2021-09-01 00:00:02.050    47153.47
2021-09-01 00:00:02.889    47153.47
2021-09-01 00:00:02.889    47153.47
2021-09-01 00:00:02.889    47153.47
Name: price, dtype: float64
it's q
time
2021-09-01 00:00:02.050    0.006
2021-09-01 00:00:02.889    0.054
2021-09-01 00:00:02.889    0.050
2021-09-01 00:00:02.889    0.050
Name: qty, dtype: float64

但是当我尝试官方文档的例子时,即使有一个函数列表,一切正常:

文档1 文档2

所以我真的不知道问题出在哪里......

标签: pythonpandasdataframepandas-groupby

解决方案


等一下。

提示在错误消息中:

'Series' object has no attribute 'price'

这是因为x您的vwap函数被调用的是Series. 这是price专栏,因为那是你所说的vwap应该得到的!( {"price": vwap, ...})

默认情况下,apply()与大多数其他函数一样,按列进行。如果你想apply()逐行,使用apply(..., axis=1).

你需要改变你的dict传递给apply()可能,但这应该让你再次开始:)


推荐阅读