首页 > 解决方案 > 删除数据位于 json 列中的 postgresql 数据库中的连续重复项

问题描述

所以我有一个名为 postgresql 的表state_data,其中有两列:datetimestate. 该state列是 jsonb 类型,并指定给定日期时间的各种状态数据。以下是该表的示例:

datetime            | state
================================================
2018-10-31 08:00:00 | {"temp":75.0,"location":1}
2018-10-31 08:01:00 | {"temp":75.0,"location":1}
2018-10-31 08:02:00 | {"temp":75.0,"location":1}
2018-10-31 08:03:00 | {"temp":75.0,"location":2}
2018-10-31 08:04:00 | {"temp":74.8,"location":1}
2018-10-31 08:05:00 | {"temp":74.8,"location":2}
2018-10-31 08:06:00 | {"temp":74.7,"location":1}

随着时间的推移,这个表会变得非常大——尤其是我增加了采样频率——我真的只想存储连续行具有不同温度的数据。所以上表将简化为,

datetime            | state
================================================
2018-10-31 08:00:00 | {"temp":75.0,"location":1}
2018-10-31 08:04:00 | {"temp":74.8,"location":1}
2018-10-31 08:06:00 | {"temp":74.7,"location":1}

如果温度数据在其自己的列中,我知道如何执行此操作,但是是否有一种简单的方法来处理此操作并根据 json 列中的项目删除所有连续的重复项?

如果我想删除两个 json 项的重复项怎么办?例如,

datetime            | state
================================================
2018-10-31 08:00:00 | {"temp":75.0,"location":1}
2018-10-31 08:03:00 | {"temp":75.0,"location":2}
2018-10-31 08:04:00 | {"temp":74.8,"location":1}
2018-10-31 08:05:00 | {"temp":74.8,"location":2}
2018-10-31 08:06:00 | {"temp":74.7,"location":1}

标签: databasepostgresqljsonb

解决方案


使用窗口函数lag():

select datetime, state
from (
    select datetime, state, lag(state) over (order by datetime) as prev
    from state_data
    ) s
where state->>'temp' is distinct from prev->>'temp'

如果表有主键,您应该在删除命令中使用它。在缺少主键的情况下,您可以state转换为 jsonb:

delete from state_data
where (datetime, state::jsonb) not in (
    select datetime, state::jsonb
    from (
        select datetime, state, lag(state) over (order by datetime) as prev
        from state_data
        ) s
    where state->>'temp' is distinct from prev->>'temp'
)

推荐阅读