python - 为多个用户填补缺失的时间段
问题描述
我正在尝试填充 CSV 文件中缺少的插槽,该文件具有日期和时间作为字符串。
我的输入是:
time_slot User Location
2017-10-26 00:00:00 1 156
2017-10-26 10:00:00 1 55
2017-10-26 12:00:00 1 848
2017-10-27 02:00:00 1 79
2017-10-27 16:00:00 1 846
2017-10-27 23:00:00 1 648
2017-10-26 00:00:00 2 75
2017-10-26 02:00:00 2 32
2017-10-26 10:00:00 2 18
2017-10-27 01:00:00 2 874
2017-10-27 04:00:00 2 46
2017-10-27 18:00:00 2 96
2017-10-26 07:00:00 3 25
2017-10-26 09:00:00 3 463
2017-10-26 14:00:00 3 85
2017-10-27 06:00:00 3 95
2017-10-27 23:00:00 3 12
输出应该是
time_slot User Location
2017-10-26 00:00:00 1 156
.
.
.
.
2017-10-26 09:00:00 1 156
2017-10-26 10:00:00 1 55
2017-10-26 11:00:00 1 55
2017-10-26 12:00:00 1 848
.
. 848 for all slots in between
.
2017-10-26 24:00:00 1 848
.
. 848
2017-10-27 02:00:00 1 79
.
. 79
.
2017-10-27 16:00:00 1 846
846
Same as above
2017-10-27 23:00:00 1 648
2017-10-26 00:00:00 2 75
2017-10-26 02:00:00 2 32
2017-10-26 10:00:00 2 18
2017-10-27 01:00:00 2 874
2017-10-27 04:00:00 2 46
2017-10-27 18:00:00 2 96
2017-10-26 07:00:00 3 25
2017-10-26 09:00:00 3 463
2017-10-26 14:00:00 3 85
2017-10-27 06:00:00 3 95
2017-10-27 23:00:00 3 12
日期时间频率为 1 小时。我们不是在缺失的时隙中填充 0,而是填充先前时隙的位置点
解决方案
用于 :DataFrame.asfreq
_DataFrame.groupby
df1 = (df.groupby('User')['Location']
.apply(lambda x: x.asfreq(freq='H',method='ffill'))
.reset_index())
print (df1.head(10))
User time_slot Location
0 1 2017-10-26 00:00:00 156
1 1 2017-10-26 01:00:00 156
2 1 2017-10-26 02:00:00 156
3 1 2017-10-26 03:00:00 156
4 1 2017-10-26 04:00:00 156
5 1 2017-10-26 05:00:00 156
6 1 2017-10-26 06:00:00 156
7 1 2017-10-26 07:00:00 156
8 1 2017-10-26 08:00:00 156
9 1 2017-10-26 09:00:00 156
详情:
print (df.index)
DatetimeIndex(['2017-10-26 00:00:00', '2017-10-26 10:00:00',
'2017-10-26 12:00:00', '2017-10-27 02:00:00',
'2017-10-27 16:00:00', '2017-10-27 23:00:00',
'2017-10-26 00:00:00', '2017-10-26 02:00:00',
'2017-10-26 10:00:00', '2017-10-27 01:00:00',
'2017-10-27 04:00:00', '2017-10-27 18:00:00',
'2017-10-26 07:00:00', '2017-10-26 09:00:00',
'2017-10-26 14:00:00', '2017-10-27 06:00:00',
'2017-10-27 23:00:00'],
dtype='datetime64[ns]', name='time_slot', freq=None)
推荐阅读
- python - 这与下一次之间的天数列值为 True?
- python-3.x - 使用 Python 模块 xml.etree.ElementTree 解析有点复杂的 XML 并将值存储在列表中
- mysql - 2个表列之间的SQL差异
- javascript - Math.random() 结果似乎颠倒了
- javascript - Js fetch api post undefined as value not string
- python - 如何在 selenium 中正确设置 firefox 配置文件?
- flutter - 来自flutter的小部件ios14
- angular - 当我尝试将数组添加到我的角度同步融合网格时出现此错误
- android-studio - 单击单选按钮时如何显示图像
- c# - 如何提取第一个字符和之后的数字,直到在字符串中找到一个字符(az) - C# 2.0