首页 > 解决方案 > In SQL select query, remove first and last 0 values but not in the middle

问题描述

Here is my problem statement. I got some data from a sensor in this format:

ts                 |i|p  |idx
2019-10-28 06:00:01|0|  0|75522
2019-10-28 06:10:00|0|  0|75522
2019-10-28 06:20:00|0|  0|75522
2019-10-28 06:30:00|0|  0|75522
2019-10-28 06:40:00|0|  0|75522
2019-10-28 06:50:00|0|  0|75522
2019-10-28 07:00:00|0|  0|75522
2019-10-28 07:10:00|0|103|75526
2019-10-28 07:20:00|0|114|75535
2019-10-28 07:30:00|1|141|75550
2019-10-28 07:40:00|1|203|75575
2019-10-28 07:50:00|1|203|75575
2019-10-28 08:00:00|1|203|75575
...
2019-10-28 15:30:00|1|144|79397
2019-10-28 15:40:00|1|127|79414
2019-10-28 15:50:00|0|113|79427
2019-10-28 16:00:00|0|106|79437
2019-10-28 16:10:00|0| 99|79443
2019-10-28 16:20:00|0| 96|79445
2019-10-28 16:30:01|0| 96|79446
2019-10-28 16:40:00|0|  0|79446
2019-10-28 16:50:00|0|  0|79446
2019-10-28 17:00:00|0|  0|79446
2019-10-28 17:10:00|0|  0|79446

For a given day, I would like to extract values like this, removing data where idx has not changed BUT only at beginning and end of the day:

2019-10-28 07:00:00|0|0|75522   -- Remove all unchanged values before
2019-10-28 07:10:00|0|103|75526
2019-10-28 07:20:00|0|114|75535
2019-10-28 07:30:00|1|141|75550
2019-10-28 07:40:00|1|203|75575 
2019-10-28 07:50:00|1|203|75575 -- Keep this
2019-10-28 08:00:00|1|203|75575 -- Keep this
...
2019-10-28 15:30:00|1|144|79397
2019-10-28 15:40:00|1|127|79414
2019-10-28 15:50:00|0|113|79427
2019-10-28 16:00:00|0|106|79437
2019-10-28 16:10:00|0|99|79443
2019-10-28 16:20:00|0|96|79445
2019-10-28 16:30:01|0|96|79446  -- Remove all unchanged values after

I tried to play with LAG to compute the idx delta bewteen row and row -1.

SELECT ts, i, p, idx, idx - LAG (idx, 1, idx) OVER (ORDER BY ts) 
FROM my_table 
WHERE DATE(ts) = '2019-10-28'
ORDER BY ts ASC

and the result is promising:

2019-10-28 06:00:01|0|0|75522|0   -- To be removed 
2019-10-28 06:10:00|0|0|75522|0   -- To be removed 
2019-10-28 06:20:00|0|0|75522|0   -- To be removed 
2019-10-28 06:30:00|0|0|75522|0   -- To be removed 
2019-10-28 06:40:00|0|0|75522|0   -- To be removed 
2019-10-28 06:50:00|0|0|75522|0   -- To be removed 
2019-10-28 07:00:00|0|0|75522|0   -- Keep this
2019-10-28 07:10:00|0|103|75526|4
2019-10-28 07:20:00|0|114|75535|9
2019-10-28 07:30:00|1|141|75550|15
2019-10-28 07:40:00|1|203|75575|15 
2019-10-28 07:50:00|1|203|75575|0 -- Keep this
2019-10-28 08:00:00|1|203|75575|0 -- Keep this
...
2019-10-28 15:30:00|1|144|79397|20
2019-10-28 15:40:00|1|127|79414|17
2019-10-28 15:50:00|0|113|79427|13
2019-10-28 16:00:00|0|106|79437|10
2019-10-28 16:10:00|0|99|79443|6
2019-10-28 16:20:00|0|96|79445|2
2019-10-28 16:30:01|0|96|79446|1
2019-10-28 16:40:00|0|0|79446|0   -- Keep this
2019-10-28 16:50:00|0|0|79446|0   -- To be removed 
2019-10-28 17:00:00|0|0|79446|0   -- To be removed 
2019-10-28 17:10:00|0|0|79446|0   -- To be removed  

Now, how can I tweak the query to remove all first 0 and all last 0, but keep the last first 0, and first last 0 (you follow me :- !) and do not touch the values in the middle even 0 values?

Is it better to do it through a post treatment in my Python code?


Edit 31-Oct-19: The idx column contains a data always increasing, except when the source read is changed. In my case, I take the data from a consumption meter. But the electricity company has decided to change the meter... Now restarting from 0. So it would be better to analyse days from the timestamp + value changes from first/last record of the day!

标签: pythonsqlsqlite

解决方案


我认为你可以从你的初始数据中做到这一点:

SELECT * FROM
    (
    SELECT T1.*, 
        MIN(TS) OVER (PARTITION BY IDX) AS MIN_TS,
        MAX(TS) OVER (PARTITION BY IDX) AS MAX_TS,
        MIN(TS) OVER () AS MIN_GLOBAL_TS,
        MAX(TS) OVER () AS MAX_GLOBAL_TS
    FROM TABLENAME T1
    ) T2
WHERE ((TS = MIN_TS OR TS = MAX_TS) 
    and TS !=  MIN_GLOBAL_TS 
    and TS !=  MAX_GLOBAL_TS)
    or MIN_TS = MAX_TS

在这里,您可以找到任何给定 idx 的第一个和最后一个时间戳,然后选择那些时间戳是第一个或最后一个的字符串,同时检查一天的开始和结束(全局最小值和最大值)并删除与它们对应的那些值。

我假设您的时间戳是日期格式。

如果一天结束或开始时没有重复值,则编辑查询以返回该行。


推荐阅读