首页 > 解决方案 > 如何找到熊猫列中连续零的最大计数?

问题描述

我有数据框,想检查 B 列中连续零值的最大计数。

示例输入和输出:

df = pd.DataFrame({'B':[1,3,4,0,0,11,1,15,0,0,0,87]})

df_out = pd.DataFrame({'max_count':[3]})

怎么可能做到这一点?

标签: python-3.xpandasnumpypandas-groupby

解决方案


一种 NumPy 方式 -

a = df['B'].values
m1 = np.r_[False, a==0, False]
idx = np.flatnonzero(m1[:-1] != m1[1:])
out = (idx[1::2]-idx[::2]).max()

循序渐进——

# Input data as array
In [83]: a
Out[83]: array([ 1,  3,  4,  0,  0, 11,  1, 15,  0,  0,  0, 87])

# Mask of starts and ends for each island of 0s
In [193]: m1
Out[193]: 
array([False, False, False, False,  True,  True, False, False, False,
        True,  True,  True, False, False])

# Indices of those starts and ends
In [85]: idx
Out[85]: array([ 3,  5,  8, 11])

# Finally the differencing between starts and ends and max for final o/p
In [86]: out
Out[86]: 3

这可以转换为单线:

np.diff(np.flatnonzero(np.diff(np.r_[0,a==0,0])).reshape(-1,2),axis=1).max()

推荐阅读