首页 > 解决方案 > 在熊猫数据框中查找丢失的时间戳

问题描述

我在数据框中有以下数据集

        Time_stamp           x        y
    '2012-01-01 00:00:00'   8.97    1310.03
    '2012-01-01 00:10:00'   9.91    1684.52
    '2012-01-01 00:40:00'   9.64    1532.05
    '2012-01-01 00:50:00'   11.84   1997.87
    '2012-01-01 00:60:00'   11.69   2135.76
    '2012-01-01 01:00:00'   12.14   2149.54
    '2012-01-01 01:10:00'   13.43   2056.35
    '2012-01-01 01:20:00'   9.88    1633.45
    '2012-01-01 01:30:00'   9.01    1315.85
   '2012-01-01  01:50:00'   8.33    1141.84

如您所见,每 10 分钟记录一次数据。但是,缺少时间戳及其对应的值,例如'2012-01-01 00:20:00''2012-01-01 00:30:00'。我想找到这样丢失的时间戳并将它们的相应值替换为nan. 像这样的东西

     timestamp            x      y
`'2012-01-01 00:20:00'`   nan    nan
`'2012-01-01 00:30:00'`   nan    nan

任何想法如何在没有太多代码行的情况下有效地做到这一点。

标签: pythonpandasdatetimetimestamp

解决方案


首先将值转换为日期时间,60Minin2012-01-01 00:60:00无效,因此替换为NaT,删除缺失值NaT,然后创建DatetimeIndex并添加缺失的日期DataFrame.asfreq时间:

df['Time_stamp'] = pd.to_datetime(df['Time_stamp'].str.strip("'"), errors='coerce')

df = df.dropna(subset=['Time_stamp']).set_index('Time_stamp').asfreq('10Min')
print (df)
                         x        y
Time_stamp                         
2012-01-01 00:00:00   8.97  1310.03
2012-01-01 00:10:00   9.91  1684.52
2012-01-01 00:20:00    NaN      NaN
2012-01-01 00:30:00    NaN      NaN
2012-01-01 00:40:00   9.64  1532.05
2012-01-01 00:50:00  11.84  1997.87
2012-01-01 01:00:00  12.14  2149.54
2012-01-01 01:10:00  13.43  2056.35
2012-01-01 01:20:00   9.88  1633.45
2012-01-01 01:30:00   9.01  1315.85
2012-01-01 01:40:00    NaN      NaN
2012-01-01 01:50:00   8.33  1141.84

推荐阅读