首页 > 解决方案 > 从 PostgreSQL 获取行样本

问题描述

我有一个带有时间戳和一些数据列的数据表。行以任意时间间隔插入,即连续行的时间戳差异不稳定,范围从几秒到几小时不等。

如果有的话,我需要在每个固定长度的时间间隔中选择一行。

例子

为了获得选择的时间间隔,我有一个生成时间间隔表的函数,例如:

start_time              end_time
'2021-05-31 10:00:00'   '2021-05-31 10:10:00'
'2021-05-31 10:10:00'   '2021-05-31 10:20:00'
'2021-05-31 10:20:00'   '2021-05-31 10:30:00'

然后对于这个源数据:

timestamp                data
'2021-05-31 10:01:00'    1
'2021-05-31 10:02:00'    2
'2021-05-31 10:05:00'    3
'2021-05-31 10:21:00'    4

我需要结果:

timestamp                data
'2021-05-31 10:01:00'    1
'2021-05-31 10:21:00'    4

谢谢

标签: sqlpostgresql

解决方案


您可以尝试使用时间间隔加入表,按时间间隔排序并获得DISTINCT ON (start_time,end_time). 下面的示例创建时间间隔并将它们与 table 连接t

SELECT DISTINCT ON (start_time,end_time) tm, data
FROM t 
JOIN (SELECT 
       to_timestamp(floor((extract('epoch' from tm) / 600 )) * 600) AT TIME ZONE 'UTC' AS start_time,
       to_timestamp(floor((extract('epoch' from tm) / 600 )) * 600) AT TIME ZONE 'UTC' + interval '10 minutes' AS end_time
      FROM t) j ON tm BETWEEN start_time AND end_time
ORDER BY start_time,end_time,tm;

演示:db<>fiddle

WITH t (tm,data) AS(
  VALUES ('2021-05-31 10:01:00'::timestamp,1),
         ('2021-05-31 10:02:00'::timestamp,2),
         ('2021-05-31 10:05:00'::timestamp,3),
         ('2021-05-31 10:21:00'::timestamp,4)
) 
SELECT DISTINCT ON (start_time,end_time) tm, data
FROM t 
JOIN (SELECT 
       to_timestamp(floor((extract('epoch' from tm) / 600 )) * 600) AT TIME ZONE 'UTC' AS start_time,
       to_timestamp(floor((extract('epoch' from tm) / 600 )) * 600) AT TIME ZONE 'UTC' + interval '10 minutes' AS end_time
      FROM t) j ON tm BETWEEN start_time AND end_time
ORDER BY start_time,end_time,tm;

---------------------+-----
 2021-05-31 10:01:00 |   1
 2021-05-31 10:21:00 |   4
(2 rows)

推荐阅读