首页 > 解决方案 > 识别数据框列中两个空值之间的最大延伸值

问题描述

我有日期时间和列的数据框。我必须找到两个空值之间的最大延伸值。在下面的示例中,两个空值之间的最大延伸值为 4,即从时间戳 '02-01-2018 00:05 到 02-01-2018 00:20'

以下是我的示例数据:

Datetime            X
01-01-2018 00:00    1
01-01-2018 00:05    Nan
01-01-2018 00:10    2
01-01-2018 00:15    3
01-01-2018 00:20    2
01-01-2018 00:25    Nan
01-01-2018 00:30    Nan
01-01-2018 00:35    Nan
01-01-2018 00:40    4
02-01-2018 00:00    Nan
02-01-2018 00:05    2
02-01-2018 00:10    2
02-01-2018 00:15    2
02-01-2018 00:20    2
02-01-2018 00:25    Nan
02-01-2018 00:30    Nan
02-01-2018 00:35    3
02-01-2018 00:40    Nan

标签: pythonpandasdataframe

解决方案


假设您只想要两个空值之间的最大拉伸计数,您可以使用Series.isnull()来查找空值的索引并list comprehension找到差异:

indexes = df[df.X.isnull()].index         
max([(indexes[i+1] - indexes[i]-1) for i in range(len(indexes)-1)])
>> 4

如果您还想要时间戳:

indexes = df[df.X.isnull()].index          
max_nulls = max([((indexes[i+1] - indexes[i]-1), indexes[i], indexes[i+1]) for i in range(len(indexes)-1)], key = lambda x: x[0])
max_nulls
>>(4, 9, 15)

df.loc[max_nulls[1]:max_nulls[2]]
     Datetime             X
9   02-01-2018 00:00    NaN
10  02-01-2018 00:05    2.0
11  02-01-2018 00:10    2.0
12  02-01-2018 00:15    2.0
13  02-01-2018 00:20    2.0
14  02-01-2018 00:25    NaN

如果您只想要它们之间具有最大非空值延伸的时间戳,请使用:

df.loc[[max_nulls[1], max_nulls[2]]]
    Datetime             X
9   02-01-2018 00:00    NaN
14  02-01-2018 00:25    NaN

或者

df.loc[[max_nulls[1]+1, max_nulls[2]-1]]

      Datetime           X
10  02-01-2018 00:05    2.0
13  02-01-2018 00:20    2.0

推荐阅读