首页 > 解决方案 > 计算熊猫时间序列中的连续 nan 值

问题描述

我实际上在 Python 3 和 Pandas 中处理时间序列,我想合成连续缺失值的周期,但我只能找到 nan 值的索引......

Sample data :
                     Valeurs
2018-01-01 00:00:00      1.0
2018-01-01 04:00:00      NaN
2018-01-01 08:00:00      2.0
2018-01-01 12:00:00      NaN
2018-01-01 16:00:00      NaN
2018-01-01 20:00:00      5.0
2018-01-02 00:00:00      6.0
2018-01-02 04:00:00      7.0
2018-01-02 08:00:00      8.0
2018-01-02 12:00:00      9.0
2018-01-02 16:00:00      5.0
2018-01-02 20:00:00      NaN
2018-01-03 00:00:00      NaN
2018-01-03 04:00:00      NaN
2018-01-03 08:00:00      1.0
2018-01-03 12:00:00      2.0
2018-01-03 16:00:00      NaN

Expected results :
       Start_Date      number of contiguous missing values 
2018-01-01 04:00:00      1
2018-01-01 12:00:00      2
2018-01-02 20:00:00      3
2018-01-03 16:00:00      1

我怎样才能用熊猫(shift(),cumsum(),groupby()???)获得这种类型的结果?

感谢您的意见!

西尔万

标签: pythonpython-3.xpandastime-seriescontinuous

解决方案


groupbyagg

mask = df.Valeurs.isna()
d = df.index.to_series()[mask].groupby((~mask).cumsum()[mask]).agg(['first', 'size'])
d.rename(columns=dict(size='num of contig null', first='Start_Date')).reset_index(drop=True)

            Start_Date  num of contig null
0  2018-01-01 04:00:00                   1
1  2018-01-01 12:00:00                   2
2  2018-01-02 20:00:00                   3
3  2018-01-03 16:00:00                   1

推荐阅读