python - Python - 比较 2 列和 2 行中的 TS
问题描述
我有一个带有多个引擎的 Df、一个开始和结束 Dt 以及一个信息代码。(此处为示例)
engine start end duration info energy
20 a 2020-04-16 09:40:00 2020-04-17 00:00:00 860 1 1982
21 a 2020-04-17 00:01:00 2020-04-18 00:00:00 1439 1 3254
22 a 2020-04-18 00:01:00 2020-04-19 00:00:00 1439 1 3258
23 a 2020-04-19 00:01:00 2020-04-20 00:00:00 1439 1 3353
24 a 2020-04-20 00:01:00 2020-04-21 00:00:00 1439 1 3253
25 a 2020-04-21 00:01:00 2020-04-22 00:00:00 1439 1 3053
26 a 2020-04-22 00:01:00 2020-04-22 06:27:00 386 1 820
27 a 2020-04-23 15:26:00 2020-04-24 00:00:00 514 1 1836
28 a 2020-04-24 00:01:00 2020-04-25 00:00:00 1439 1 4797
29 a 2020-04-25 00:01:00 2020-04-26 00:00:00 1439 1 4265
我想对我的df进行分组,以了解每个引擎何时出现代码或日期中断。
我想比较喜欢
if dt.loc[i+1, 'start'] - dt.loc[i, 'end'] <= day:
dt['diff']= False
else:
dt['diff'] = True
那我可以cumsum df['diff']
例如,在这里,我想得到
引擎表格 2020-04-16 9:40 至 2020-04-22 6:27 能量和信息代码的总和以及引擎表格 2020-04-23 15:26 至 2020-04-26 0:00
我试试这个,但它不起作用
df = df.sort_values(['engine','start','code'])
dt = df[['start','end']]
diff = df['codea']
day = pd.Timedelta('1M')
breaks = dt.start.diff().apply(lambda x: False if x <= day else True)
breaks2 = diff.diff() != 0
groups = breaks.cumsum()
groups2 = breaks2.cumsum()
df['follow'] = groups
df['follow2'] = groups2
dfend = (df.groupby(['engine','follow','follow2','code'])
.agg(
begin=('start','first'),
end = ('end','last'),
ensum = ('energy','sum')
)
)
多谢
解决方案
您可以使用groupby
以下方法执行此操作:
import pandas as pd
import numpy as np
df = pd.DataFrame({"engine": ["a"]*10,
"start": pd.to_datetime(pd.Series(["2020-04-16 09:40:00", "2020-04-17 00:01:00",
"2020-04-18 00:01:00", "2020-04-19 00:01:00",
"2020-04-20 00:01:00", "2020-04-21 00:01:00",
"2020-04-22 00:01:00", "2020-04-23 15:26:00",
"2020-04-24 00:01:00", "2020-04-25 00:01:00"])),
"end": pd.to_datetime(pd.Series(["2020-04-17 00:00:00", "2020-04-18 00:00:00",
"2020-04-19 00:00:00", "2020-04-20 00:00:00",
"2020-04-21 00:00:00", "2020-04-22 00:00:00",
"2020-04-22 06:27:00", "2020-04-24 00:00:00",
"2020-04-25 00:00:00", "2020-04-26 00:00:00"])),
"duration": [860, 1439, 1439, 1439, 1439, 1439, 386, 514, 1439, 1439],
"info": [1]*10,
"energy": [1982, 3254, 3258, 3353, 3253, 3053, 820, 1838, 4797, 4265]})
minute = pd.Timedelta('1M')
df.groupby(["engine", "info"])\
.apply(lambda x:\
x.groupby(((x["start"] - x["end"].shift(1)) > minute).cumsum())
.agg({"start": min, "end": max, "energy": np.sum}))
输出是:
start end energy
engine info
a 1 0 2020-04-16 09:40:00 2020-04-22 06:27:00 18973
1 2020-04-23 15:26:00 2020-04-26 00:00:00 10900
推荐阅读
- ios - Swift TableView ImageView + Label = 致命错误:索引超出范围
- vue.js - 可编辑的 Tabulator 表格单元格中的自定义 Vue 组件
- android - 使用 AsyncTask 从 URL 获取数据到数组列表中
- flutter - ListWheelScrollView 不识别水龙头
- javascript - ajax成功后如何在输入框中放置链接?
- r - R Shiny - 在单独的表格中显示线性、多项式和三次回归汇总
- css - 字体真棒 scss unicode 显示矩形
- c# - TeeChart 如何更改标记字体
- queue - 在集群环境中在 Weblogic 内部 JMS 上设置网桥
- plsql - 匿名块内的更新失败但可以解决