python - In SQL select query, remove first and last 0 values but not in the middle
问题描述
Here is my problem statement. I got some data from a sensor in this format:
ts |i|p |idx
2019-10-28 06:00:01|0| 0|75522
2019-10-28 06:10:00|0| 0|75522
2019-10-28 06:20:00|0| 0|75522
2019-10-28 06:30:00|0| 0|75522
2019-10-28 06:40:00|0| 0|75522
2019-10-28 06:50:00|0| 0|75522
2019-10-28 07:00:00|0| 0|75522
2019-10-28 07:10:00|0|103|75526
2019-10-28 07:20:00|0|114|75535
2019-10-28 07:30:00|1|141|75550
2019-10-28 07:40:00|1|203|75575
2019-10-28 07:50:00|1|203|75575
2019-10-28 08:00:00|1|203|75575
...
2019-10-28 15:30:00|1|144|79397
2019-10-28 15:40:00|1|127|79414
2019-10-28 15:50:00|0|113|79427
2019-10-28 16:00:00|0|106|79437
2019-10-28 16:10:00|0| 99|79443
2019-10-28 16:20:00|0| 96|79445
2019-10-28 16:30:01|0| 96|79446
2019-10-28 16:40:00|0| 0|79446
2019-10-28 16:50:00|0| 0|79446
2019-10-28 17:00:00|0| 0|79446
2019-10-28 17:10:00|0| 0|79446
For a given day, I would like to extract values like this, removing data where idx has not changed BUT only at beginning and end of the day:
2019-10-28 07:00:00|0|0|75522 -- Remove all unchanged values before
2019-10-28 07:10:00|0|103|75526
2019-10-28 07:20:00|0|114|75535
2019-10-28 07:30:00|1|141|75550
2019-10-28 07:40:00|1|203|75575
2019-10-28 07:50:00|1|203|75575 -- Keep this
2019-10-28 08:00:00|1|203|75575 -- Keep this
...
2019-10-28 15:30:00|1|144|79397
2019-10-28 15:40:00|1|127|79414
2019-10-28 15:50:00|0|113|79427
2019-10-28 16:00:00|0|106|79437
2019-10-28 16:10:00|0|99|79443
2019-10-28 16:20:00|0|96|79445
2019-10-28 16:30:01|0|96|79446 -- Remove all unchanged values after
I tried to play with LAG to compute the idx delta bewteen row and row -1.
SELECT ts, i, p, idx, idx - LAG (idx, 1, idx) OVER (ORDER BY ts)
FROM my_table
WHERE DATE(ts) = '2019-10-28'
ORDER BY ts ASC
and the result is promising:
2019-10-28 06:00:01|0|0|75522|0 -- To be removed
2019-10-28 06:10:00|0|0|75522|0 -- To be removed
2019-10-28 06:20:00|0|0|75522|0 -- To be removed
2019-10-28 06:30:00|0|0|75522|0 -- To be removed
2019-10-28 06:40:00|0|0|75522|0 -- To be removed
2019-10-28 06:50:00|0|0|75522|0 -- To be removed
2019-10-28 07:00:00|0|0|75522|0 -- Keep this
2019-10-28 07:10:00|0|103|75526|4
2019-10-28 07:20:00|0|114|75535|9
2019-10-28 07:30:00|1|141|75550|15
2019-10-28 07:40:00|1|203|75575|15
2019-10-28 07:50:00|1|203|75575|0 -- Keep this
2019-10-28 08:00:00|1|203|75575|0 -- Keep this
...
2019-10-28 15:30:00|1|144|79397|20
2019-10-28 15:40:00|1|127|79414|17
2019-10-28 15:50:00|0|113|79427|13
2019-10-28 16:00:00|0|106|79437|10
2019-10-28 16:10:00|0|99|79443|6
2019-10-28 16:20:00|0|96|79445|2
2019-10-28 16:30:01|0|96|79446|1
2019-10-28 16:40:00|0|0|79446|0 -- Keep this
2019-10-28 16:50:00|0|0|79446|0 -- To be removed
2019-10-28 17:00:00|0|0|79446|0 -- To be removed
2019-10-28 17:10:00|0|0|79446|0 -- To be removed
Now, how can I tweak the query to remove all first 0 and all last 0, but keep the last first 0, and first last 0 (you follow me :- !) and do not touch the values in the middle even 0 values?
Is it better to do it through a post treatment in my Python code?
Edit 31-Oct-19: The idx column contains a data always increasing, except when the source read is changed. In my case, I take the data from a consumption meter. But the electricity company has decided to change the meter... Now restarting from 0. So it would be better to analyse days from the timestamp + value changes from first/last record of the day!
解决方案
我认为你可以从你的初始数据中做到这一点:
SELECT * FROM
(
SELECT T1.*,
MIN(TS) OVER (PARTITION BY IDX) AS MIN_TS,
MAX(TS) OVER (PARTITION BY IDX) AS MAX_TS,
MIN(TS) OVER () AS MIN_GLOBAL_TS,
MAX(TS) OVER () AS MAX_GLOBAL_TS
FROM TABLENAME T1
) T2
WHERE ((TS = MIN_TS OR TS = MAX_TS)
and TS != MIN_GLOBAL_TS
and TS != MAX_GLOBAL_TS)
or MIN_TS = MAX_TS
在这里,您可以找到任何给定 idx 的第一个和最后一个时间戳,然后选择那些时间戳是第一个或最后一个的字符串,同时检查一天的开始和结束(全局最小值和最大值)并删除与它们对应的那些值。
我假设您的时间戳是日期格式。
如果一天结束或开始时没有重复值,则编辑查询以返回该行。
推荐阅读
- reactjs - 如何测试从 mapDispatchToProps 传递的函数(React/Redux/Enzyme/Jest)
- java - Kotlin 与正则表达式的拆分工作不如预期
- python-3.x - opencv关键点上的图像处理
- visual-studio-code - pylint E4010:无法导入“google.appengine.ext”
- python - Sympy 无法评估涉及伽马函数的无限和
- node.js - 如何在 Node.js 中管理 Cron 作业
- php - 在 Codeigniter Facebook\Facebook 类中找不到 Facebook 登录
- oracle - 在 APEX 应用程序中运行安装脚本时如何为所有者使用替换字符串?
- angular6 - 无法从导航栏导航到 Angular 6 组件
- c# - MS Graph - 401 未经授权看似具有适当的令牌和访问权限