首页 > 解决方案 > 为多个用户填补缺失的时间段

问题描述

我正在尝试填充 CSV 文件中缺少的插槽,该文件具有日期和时间作为字符串。

我的输入是:

time_slot              User       Location
2017-10-26 00:00:00      1           156 
2017-10-26 10:00:00      1           55
2017-10-26 12:00:00      1           848
2017-10-27 02:00:00      1           79
2017-10-27 16:00:00      1           846
2017-10-27 23:00:00      1           648
2017-10-26 00:00:00      2           75
2017-10-26 02:00:00      2           32
2017-10-26 10:00:00      2           18
2017-10-27 01:00:00      2           874
2017-10-27 04:00:00      2           46
2017-10-27 18:00:00      2           96
2017-10-26 07:00:00      3           25
2017-10-26 09:00:00      3           463
2017-10-26 14:00:00      3           85
2017-10-27 06:00:00      3           95
2017-10-27 23:00:00      3           12

输出应该是

time_slot              User       Location
2017-10-26 00:00:00      1           156
.
.
.
.
2017-10-26 09:00:00      1           156 
2017-10-26 10:00:00      1           55
2017-10-26 11:00:00      1           55
2017-10-26 12:00:00      1           848
.
.          848 for all slots in between
.
2017-10-26 24:00:00      1           848
.
.           848

2017-10-27 02:00:00      1           79
.
.          79
.

2017-10-27 16:00:00      1           846

             846
Same as above

2017-10-27 23:00:00      1           648
2017-10-26 00:00:00      2           75
2017-10-26 02:00:00      2           32
2017-10-26 10:00:00      2           18
2017-10-27 01:00:00      2           874
2017-10-27 04:00:00      2           46
2017-10-27 18:00:00      2           96
2017-10-26 07:00:00      3           25
2017-10-26 09:00:00      3           463
2017-10-26 14:00:00      3           85
2017-10-27 06:00:00      3           95
2017-10-27 23:00:00      3           12

日期时间频率为 1 小时。我们不是在缺失的时隙中填充 0,而是填充先前时隙的位置点

标签: pythonpandas

解决方案


用于 :DataFrame.asfreq_DataFrame.groupby

df1 = (df.groupby('User')['Location']
        .apply(lambda x: x.asfreq(freq='H',method='ffill'))
        .reset_index())
print (df1.head(10))
   User           time_slot  Location
0     1 2017-10-26 00:00:00       156
1     1 2017-10-26 01:00:00       156
2     1 2017-10-26 02:00:00       156
3     1 2017-10-26 03:00:00       156
4     1 2017-10-26 04:00:00       156
5     1 2017-10-26 05:00:00       156
6     1 2017-10-26 06:00:00       156
7     1 2017-10-26 07:00:00       156
8     1 2017-10-26 08:00:00       156
9     1 2017-10-26 09:00:00       156

详情

print (df.index)
DatetimeIndex(['2017-10-26 00:00:00', '2017-10-26 10:00:00',
               '2017-10-26 12:00:00', '2017-10-27 02:00:00',
               '2017-10-27 16:00:00', '2017-10-27 23:00:00',
               '2017-10-26 00:00:00', '2017-10-26 02:00:00',
               '2017-10-26 10:00:00', '2017-10-27 01:00:00',
               '2017-10-27 04:00:00', '2017-10-27 18:00:00',
               '2017-10-26 07:00:00', '2017-10-26 09:00:00',
               '2017-10-26 14:00:00', '2017-10-27 06:00:00',
               '2017-10-27 23:00:00'],
              dtype='datetime64[ns]', name='time_slot', freq=None)

推荐阅读