python - 熊猫每个给定的时间间隔插入行
问题描述
具有以下DF:
id init_time event_time value
0 0 2020-10-21 00:21:25 2020-10-21 00:21:30 1
1 0 2020-10-21 00:21:25 2020-10-21 00:21:35 3
2 0 2020-10-21 00:21:25 2020-10-21 00:23:25 7
3 1 2020-10-21 15:21:25 2020-10-21 15:21:30 3
4 1 2020-10-21 15:21:25 2020-10-21 15:22:25 6
5 1 2020-10-21 15:21:25 2020-10-21 16:21:25 9
6 1 2020-10-21 15:21:25 2020-10-21 18:21:25 10
7 1 2020-10-21 15:21:25 2020-10-21 18:30:25 10
8 2 2020-10-21 08:11:20 2020-10-21 08:21:25 1
9 2 2020-10-21 08:11:20 2020-10-21 09:11:20 2
10 2 2020-10-21 08:11:20 2020-10-21 09:21:25 3
11 2 2020-10-21 08:11:20 2020-10-21 10:12:12 4
12 2 2020-10-21 08:11:20 2020-10-21 10:30:00 5
13 2 2020-10-21 08:11:20 2020-10-21 11:50:48 6
14 2 2020-10-21 08:11:20 2020-10-21 11:59:15 7
我想按 分组id
,然后基于init_time每隔 X 间隔插入一行并重复 Y 次。请注意,init_time
这对于所有行都是相同的,并且event_time
是创建行的时间。
例如,如果X
is1h
和Y
is5
次,则结果 DF 将是:
id init_time event_time value
0 0 2020-10-21 00:21:25 2020-10-21 00:21:25 -- ⎫
1 0 2020-10-21 00:21:25 2020-10-21 00:21:30 1 ⎪
2 0 2020-10-21 00:21:25 2020-10-21 00:21:35 3 ⎬ 1H interval (00:21:25 - 01:21:25)
3 0 2020-10-21 00:21:25 2020-10-21 00:23:25 7 ⎪
4 0 2020-10-21 00:21:25 2020-10-21 01:21:25 -- ⎭
5 0 2020-10-21 00:21:25 2020-10-21 02:21:25 --
6 0 2020-10-21 00:21:25 2020-10-21 03:21:25 --
7 0 2020-10-21 00:21:25 2020-10-21 04:21:25 --
8 1 2020-10-21 15:21:25 2020-10-21 15:21:25 -- ⎫
9 1 2020-10-21 15:21:25 2020-10-21 15:21:30 3 ⎪
10 1 2020-10-21 15:21:25 2020-10-21 15:22:25 6 ⎬ 1H interval (15:21:25 - 16:21:25)
11 1 2020-10-21 15:21:25 2020-10-21 16:21:25 -- ⎭
12 1 2020-10-21 15:21:25 2020-10-21 16:21:25 9
13 1 2020-10-21 15:21:25 2020-10-21 17:21:25 --
14 1 2020-10-21 15:21:25 2020-10-21 18:21:25 --
15 1 2020-10-21 15:21:25 2020-10-21 18:21:25 10
16 1 2020-10-21 15:21:25 2020-10-21 18:30:25 10
17 1 2020-10-21 15:21:25 2020-10-21 19:21:25 --
18 2 2020-10-21 08:11:20 2020-10-21 08:11:20 -- ⎫
19 2 2020-10-21 08:11:20 2020-10-21 08:21:25 1 ⎬ 1H interval (08:11:20 - 09:11:20)
20 2 2020-10-21 08:11:20 2020-10-21 09:11:20 -- ⎭
21 2 2020-10-21 08:11:20 2020-10-21 09:11:20 2
22 2 2020-10-21 08:11:20 2020-10-21 09:21:25 3
23 2 2020-10-21 08:11:20 2020-10-21 10:11:20 --
24 2 2020-10-21 08:11:20 2020-10-21 10:12:12 4
25 2 2020-10-21 08:11:20 2020-10-21 10:30:00 5
26 2 2020-10-21 08:11:20 2020-10-21 11:11:20 --
27 2 2020-10-21 08:11:20 2020-10-21 11:50:48 6
28 2 2020-10-21 08:11:20 2020-10-21 11:59:15 7
29 2 2020-10-21 08:11:20 2020-10-21 12:11:20 --
任何帮助,将不胜感激 :)
更新: 这是我到目前为止所做的:
df2 = pd.DataFrame()
groups = df.groupby('id')['init_time'].first()
for g in groups:
tmp = pd.DataFrame(index=pd.date_range(g, periods=5, freq="H"))
tmp['value'] = -1
df2 = df2.append(tmp,ignore_index=False)
df2 = df2.reset_index(drop=False)
df2['event_time'] = df2.pop('index')
代码创建缺失的行:
value event_time
0 -1 2020-10-21 00:21:25
1 -1 2020-10-21 01:21:25
2 -1 2020-10-21 02:21:25
3 -1 2020-10-21 03:21:25
4 -1 2020-10-21 04:21:25
5 -1 2020-10-21 15:21:25
6 -1 2020-10-21 16:21:25
7 -1 2020-10-21 17:21:25
8 -1 2020-10-21 18:21:25
9 -1 2020-10-21 19:21:25
10 -1 2020-10-21 08:11:20
11 -1 2020-10-21 09:11:20
12 -1 2020-10-21 10:11:20
13 -1 2020-10-21 11:11:20
14 -1 2020-10-21 12:11:20
现在我需要弄清楚如何合并 DF,或者找到更好的选择
Update2:我正在寻找优化@Serge Ballesta 给出的解决方案,以获得性能时间。
解决方案
我找不到聪明的方法。所以我只提议:
- 每列获取一行数据(假设id是唯一的)
id
init_time
- 添加一个
event_time
从 a 构建的新列date_range
并连接所有内容 - 将其与初始数据框连接,排序并构建一个新索引
代码可能是(假设数据框是df
并且xxx_time
列具有datetime64[ns]
dtypes):
# extract init_time per id
df1 = df[['id', 'init_time']].drop_duplicates()
# build the new event_time data
df2 = pd.concat(pd.DataFrame({'id': [s[1]['id']] * 5,
'init_time': [s[1]['init_time']] * 5,
'event_time': pd.date_range(s[1]['init_time'],
periods=5,
freq='1H')})
for s in df1.iterrows())
# concat with the initial dataframe
final = pd.concat((df, df2)).sort_values(['id', 'event_time']).reset_index(
drop=True)
使用您的示例数据,它提供:
id init_time event_time value
0 0 2020-10-21 00:21:25 2020-10-21 00:21:25 NaN
1 0 2020-10-21 00:21:25 2020-10-21 00:21:30 1.0
2 0 2020-10-21 00:21:25 2020-10-21 00:21:35 3.0
3 0 2020-10-21 00:21:25 2020-10-21 00:23:25 7.0
4 0 2020-10-21 00:21:25 2020-10-21 01:21:25 NaN
5 0 2020-10-21 00:21:25 2020-10-21 02:21:25 NaN
6 0 2020-10-21 00:21:25 2020-10-21 03:21:25 NaN
7 0 2020-10-21 00:21:25 2020-10-21 04:21:25 NaN
8 1 2020-10-21 15:21:25 2020-10-21 15:21:25 NaN
9 1 2020-10-21 15:21:25 2020-10-21 15:21:30 3.0
10 1 2020-10-21 15:21:25 2020-10-21 15:22:25 6.0
11 1 2020-10-21 15:21:25 2020-10-21 16:21:25 9.0
12 1 2020-10-21 15:21:25 2020-10-21 16:21:25 NaN
13 1 2020-10-21 15:21:25 2020-10-21 17:21:25 NaN
14 1 2020-10-21 15:21:25 2020-10-21 18:21:25 10.0
15 1 2020-10-21 15:21:25 2020-10-21 18:21:25 NaN
16 1 2020-10-21 15:21:25 2020-10-21 18:30:25 10.0
17 1 2020-10-21 15:21:25 2020-10-21 19:21:25 NaN
18 2 2020-10-21 08:11:20 2020-10-21 08:11:20 NaN
19 2 2020-10-21 08:11:20 2020-10-21 08:21:25 1.0
20 2 2020-10-21 08:11:20 2020-10-21 09:11:20 2.0
21 2 2020-10-21 08:11:20 2020-10-21 09:11:20 NaN
22 2 2020-10-21 08:11:20 2020-10-21 09:21:25 3.0
23 2 2020-10-21 08:11:20 2020-10-21 10:11:20 NaN
24 2 2020-10-21 08:11:20 2020-10-21 10:12:12 4.0
25 2 2020-10-21 08:11:20 2020-10-21 10:30:00 5.0
26 2 2020-10-21 08:11:20 2020-10-21 11:11:20 NaN
27 2 2020-10-21 08:11:20 2020-10-21 11:50:48 6.0
28 2 2020-10-21 08:11:20 2020-10-21 11:59:15 7.0
29 2 2020-10-21 08:11:20 2020-10-21 12:11:20 NaN
推荐阅读
- mysql - 具有多个帐户的应用程序的数据库结构应该是什么,每个帐户具有相似类型的数据?
- java - 无法识别错误推理 - 与 ArrayList 和 FOR 循环相关,我该如何修复我的代码?
- c++ - 在 qwebview qt 上使用 qwebelement 进行虚假点击
- mysql - Mysql 上的 Gtfs 数据库 - 搜索区域内的路线时查询缓慢
- php - 如何将变量与 Curl 代码 Php 连接起来
- file - 批处理文件将子文件夹内容从映射位置移动到服务器
- node.js - 如何在打字稿中触发变量值更改的事件 - nodejs
- quill - 文本搜索后如何设置选择?
- c# - 为什么我的自定义 MapRoute 会出现“参数字典包含参数的空条目”
- java - 如何以更好的方式解决 OutOfMemory 错误