首页 > 解决方案 > Python:连续超过一定水平

问题描述

我想找到一个函数,它以熊猫系列作为输入并输出该系列连续超过某个水平的次数。例子:

输入: myfunction([6,7,6,4,2,6,9,8,6],level = 5)

输出: [1,2,3,0,0,1,2,3,4]

标签: pythonpandaslistnumpy

解决方案


这是一种方法 -

def myfunction(s, level):
    m = s>level
    p = m.ne(m.shift(1)).cumsum()
    return m*(p.groupby(p).cumcount()+1)

如果您需要低于阈值/水平的负范围值,则在最后一步只需执行 -

(2*m-1)*(p.groupby(p).cumcount()+1)

样品运行 -

In [153]: s
Out[153]: 
0    6
1    7
2    6
3    4
4    2
5    6
6    9
7    8
8    6
dtype: int64

In [154]: myfunction(s, level=5)
Out[154]: 
0    1
1    2
2    3
3    0
4    0
5    1
6    2
7    3
8    4
dtype: int64

幸运的是,为了性能,我们有 NumPy -

# https://stackoverflow.com/a/46183637/ @Divakar
def intervaled_ramp(a, thresh=1):
    mask = a>thresh

    # Get start, stop indices
    mask_ext = np.concatenate(([False], mask, [False] ))
    idx = np.flatnonzero(mask_ext[1:] != mask_ext[:-1])
    s0,s1 = idx[::2], idx[1::2]

    out = mask.astype(int)
    valid_stop = s1[s1<len(a)]
    out[valid_stop] = s0[:len(valid_stop)] - valid_stop
    return out.cumsum()

out = intervaled_ramp(s.values, thresh=level)

计时

发布的方法:

# @jezrael's soln
def myfunction_jezrael(s, level):
    a = s > level
    b = a.cumsum()
    return b-b.mask(a).ffill().fillna(0).astype(int)

# Solution from earlier
def myfunction_div1(s, level):
    m = s>level
    p = m.ne(m.shift(1)).cumsum()
    return m*(p.groupby(p).cumcount()+1)

使用给定的样本并按比例放大10000x-

In [223]: a = [6,7,6,4,2,6,9,8,6]
     ...: a = np.array(a)
     ...: s = pd.Series(a)

In [224]: s = pd.concat([s]*10000)

In [225]: %timeit myfunction_jezrael(s, level=5)
     ...: %timeit myfunction_div1(s, level=5)
     ...: %timeit intervaled_ramp(s.values, thresh=5)
2.98 ms ± 76.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
8.99 ms ± 40 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
586 µs ± 11.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

推荐阅读