首页 > 解决方案 > 计算系列中数字连续出现的次数

问题描述

我有一系列(一维数组)数字,比如 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, ...

有没有一种优雅的(最好是最快的)方法来计算 1 或 0 的连续出现次数,然后再改变?因此,结果将是 (0, 2), (1, 3), (0, 1), (1, 4), ...

标签: pythonnumpy

解决方案


这是 NumPy 的另一个,特别是使用数组切片 -

def islands_info(a):
    # Compare consecutive elems for changes. Use `True` as sentients to detect edges
    idx = np.flatnonzero(np.r_[True,a[:-1]!=a[1:],True])

    # Index into input array with the sliced array until second last array to
    # get start indices and the differentiation for the lengths
    return np.column_stack((a[idx[:-1]],np.diff(idx)))

样品运行 -

In [51]: a = np.array([0, 0, 1, 1, 1, 0, 1, 1, 1, 1])

In [52]: islands_info(a)
Out[52]: 
array([[0, 2],
       [1, 3],
       [0, 1],
       [1, 4]])

如果您需要作为元组列表的输出 -

In [56]: list(zip(*islands_info(a).T))
Out[56]: [(0, 2), (1, 3), (0, 1), (1, 4)]

计时 -

与其他基于 NumPy 的比较@yatu-

In [43]: np.random.seed(a)

In [44]: a = np.random.choice([0,1], 1000000)

In [45]: %timeit yatu(a)
11.7 ms ± 428 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [46]: %timeit islands_info(a)
8.98 ms ± 40.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [47]: np.random.seed(a)

In [48]: a = np.random.choice([0,1], 10000000)

In [49]: %timeit yatu(a)
232 ms ± 3.71 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [50]: %timeit islands_info(a)
152 ms ± 933 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

推荐阅读