首页 > 解决方案 > 在我自己的函数中使用 pandas df.rolling() 时遇到问题

问题描述

我有一个带有两列的熊猫数据框 raw_data:“T”和“BP”:

              T       BP
0        -0.500  115.790
1        -0.499  115.441
2        -0.498  115.441
3        -0.497  115.441
4        -0.496  115.790
...         ...      ...
647163  646.663  105.675
647164  646.664  105.327
647165  646.665  105.327
647166  646.666  105.327
647167  646.667  104.978

[647168 rows x 2 columns]

我想在滚动窗口上应用 Hodges-Lehmann 平均值(这是一个稳健的平均值)并创建一个新列。这是功能:

def hodgesLehmannMean(x): 
    m = np.add.outer(x, x)
    ind = np.tril_indices(len(x), 0)
    return 0.5 * np.median(m[ind])

因此,我写道:

raw_data[new_col] = raw_data['BP'].rolling(21, min_periods=1, center=True, 
                             win_type=None, axis=0, closed=None).agg(hodgesLehmannMean)

但我收到一串错误消息:

Traceback (most recent call last):
  File "C:\Users\tkpme\miniconda3\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\tkpme\miniconda3\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "c:\Users\tkpme\.vscode\extensions\ms-python.python-2020.8.101144\pythonFiles\lib\python\debugpy\__main__.py", line 45, in <module>
    cli.main()
  File "c:\Users\tkpme\.vscode\extensions\ms-python.python-2020.8.101144\pythonFiles\lib\python\debugpy/..\debugpy\server\cli.py", line 430, in main
    run()
  File "c:\Users\tkpme\.vscode\extensions\ms-python.python-2020.8.101144\pythonFiles\lib\python\debugpy/..\debugpy\server\cli.py", line 267, in run_file
    runpy.run_path(options.target, run_name=compat.force_str("__main__"))
  File "C:\Users\tkpme\miniconda3\lib\runpy.py", line 265, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "C:\Users\tkpme\miniconda3\lib\runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "C:\Users\tkpme\miniconda3\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "c:\Users\tkpme\OneDrive\Documents\Work\CMC\BP Satya and Suresh\Code\Naveen_peak_detect test.py", line 227, in <module>
    main()
  File "c:\Users\tkpme\OneDrive\Documents\Work\CMC\BP Satya and Suresh\Code\Naveen_peak_detect test.py", line 75, in main
    raw_data[new_col] = raw_data['BP'].rolling(FILTER_WINDOW, min_periods=1, center=True, win_type=None,
  File "C:\Users\tkpme\miniconda3\lib\site-packages\pandas\core\window\rolling.py", line 1961, in aggregate
    return super().aggregate(func, *args, **kwargs)
  File "C:\Users\tkpme\miniconda3\lib\site-packages\pandas\core\window\rolling.py", line 523, in aggregate
    return self.apply(func, raw=False, args=args, kwargs=kwargs)
  File "C:\Users\tkpme\miniconda3\lib\site-packages\pandas\core\window\rolling.py", line 1987, in apply
    return super().apply(
  File "C:\Users\tkpme\miniconda3\lib\site-packages\pandas\core\window\rolling.py", line 1300, in apply
    return self._apply(
  File "C:\Users\tkpme\miniconda3\lib\site-packages\pandas\core\window\rolling.py", line 507, in _apply
    result = calc(values)
  File "C:\Users\tkpme\miniconda3\lib\site-packages\pandas\core\window\rolling.py", line 495, in calc
    return func(x, start, end, min_periods)
  File "C:\Users\tkpme\miniconda3\lib\site-packages\pandas\core\window\rolling.py", line 1326, in apply_func
    return window_func(values, begin, end, min_periods)
  File "pandas\_libs\window\aggregations.pyx", line 1375, in pandas._libs.window.aggregations.roll_generic_fixed
  File "c:\Users\tkpme\OneDrive\Documents\Work\CMC\BP Satya and Suresh\Code\Naveen_peak_detect test.py", line 222, in hodgesLehmannMean
    m = np.add.outer(x, x)
  File "C:\Users\tkpme\miniconda3\lib\site-packages\pandas\core\series.py", line 705, in __array_ufunc__
    return construct_return(result)
  File "C:\Users\tkpme\miniconda3\lib\site-packages\pandas\core\series.py", line 694, in construct_return
    raise NotImplementedError
NotImplementedError

这似乎是由线驱动的

m = np.add.outer(x, x)

并指出某些未实施或缺少 numpy 的内容。但是我在一开始就导入了numpy,如下所示:

import numpy  as np
import pandas as pd 

如果我向它提供一个列表或一个 numpy 数组,该函数本身就可以很好地工作,所以我不确定问题是什么。有趣的是,如果我使用中位数而不是 Hodges-Lehmann 均值,它运行起来就像一个魅力

raw_data[new_col] = raw_data['BP'].rolling(21, min_periods=1, center=True, 
                             win_type=None, axis=0, closed=None).median()

我的问题的原因是什么,我该如何解决?

真挚地

托马斯飞利浦

标签: pandasrolling-computation

解决方案


我已经用一个小数据框尝试了你的代码,它运行良好,所以你的数据框上可能有一些必须清理或转换的东西。


推荐阅读