首页 > 解决方案 > 如何在插入之前识别要解压缩的块?

问题描述

Timescaledb 文档展示了如何解压缩特定的块:

SELECT decompress_chunk('chunk_name');

或给定超表的所有块:

SELECT decompress_chunk(show_chunks('hypertable_name'));

但是,这意味着您要么需要知道要插入哪个块,要么可以解压缩整个表。我正在使用一个大表(> 100 GB 未压缩)。在这种情况下,解压缩整个表是不切实际的,尤其是它有一个额外的维度(与时间戳一起用于分块)。

给定日期时间和维度范围,是否可以找到与我的查询相关的块?

标签: compressiontimescaledb

解决方案


更新:答案在 TimescaleDB 1.7 上进行了测试。

与时间和空间维度值匹配的特定块可以在 中的show_chunk公共信息视图和在 中的timescaledb_information内部目录表的帮助下找到_timescaledb_catalog

首先,show_chunk具有可选参数older_thannewer_than,它允许查找比具有给定时间戳的块更旧或新的块,然后从所有块中减去。例如:

SELECT c.chunk_name
FROM (SELECT show_chunks('hyper') AS chunk_name
    EXCEPT (SELECT show_chunks('hyper', older_than => '2018-07-02 06:01'::timestamptz))
    EXCEPT (SELECT show_chunks('hyper', newer_than => '2018-07-02 06:01'::timestamptz))) AS c

仅检索压缩块compression_status = 'Compressed'timescaledb_information.compressed_chunk_stats有所帮助。

如果在超表上还定义了空间维度,则上述查询将返回与空间维度上的分区数相同的块数。要找到正确的维度,有必要检查空间维度值属于哪个块以及空间维度范围存储在 中_timescaledb_catalog.dimension_slice最终查询的示例在末尾。

举个例子:

CREATE TABLE hyper(
    time timestamptz NOT NULL, 
    device int, 
    value float
);
SELECT * FROM create_hypertable('hyper', 'time', 'device', 2);

ALTER TABLE hyper SET (timescaledb.compress, 
                       timescaledb.compress_segmentby='device', 
                       timescaledb.compress_orderby = 'time DESC');

INSERT INTO hyper VALUES
       ('2017-01-01 06:01', 1, 1.2),
       ('2017-01-01 09:11', 3, 4.3),
       ('2017-01-01 08:01', 1, 7.3),
       ('2017-01-02 08:01', 2, 0.23),
       ('2018-07-02 08:01', 87, 0.0),
       ('2018-07-01 06:01', 13, 3.1),
       ('2018-07-01 09:11', 90, 10303.12),
       ('2018-07-01 08:01', 29, 64),
       ('2019-07-02 08:01', 87, 0.0),
       ('2019-07-01 06:01', 13, 3.1),
       ('2019-07-01 09:11', 90, 10303.12),
       ('2019-07-01 08:01', 29, 64);

SELECT compress_chunk(show_chunks('hyper'));

最后一个查询压缩所有块并给出结果:

            compress_chunk
-----------------------------------------
 _timescaledb_internal._hyper_3_13_chunk
 _timescaledb_internal._hyper_3_14_chunk
 _timescaledb_internal._hyper_3_15_chunk
 _timescaledb_internal._hyper_3_16_chunk
 _timescaledb_internal._hyper_3_17_chunk
 _timescaledb_internal._hyper_3_18_chunk
(6 rows)

让我们的目标是插入以下值:

INSERT INTO hyper VALUES ('2018-07-02 06:01', 12, 5.1);

失败了:

ERROR:  insert/update/delete not permitted on chunk "_hyper_3_16_chunk"
HINT:  Make sure the chunk is not compressed.

以下查询允许找到满足时间值的块:

SELECT c.chunk_name
FROM (SELECT show_chunks('hyper') AS chunk_name
    EXCEPT (SELECT show_chunks('hyper', older_than => '2018-07-02 06:01'::timestamptz))
    EXCEPT (SELECT show_chunks('hyper', newer_than => '2018-07-02 06:01'::timestamptz))) AS c
JOIN timescaledb_information.compressed_chunk_stats i ON i.chunk_name = c.chunk_name;

结果为 2 个块,因为有一个空间维度和 2 个分区:

               chunk_name
-----------------------------------------
 _timescaledb_internal._hyper_3_15_chunk
 _timescaledb_internal._hyper_3_16_chunk
(2 rows)

更新更多细节device为给定值 进一步选择一个块可以通过检查存储在_timescaledb_catalog.dimension_slice. _timescaledb_catalog.chunk通过与on chunk_name_timescaledb_catalog.chunk_constraintonchunk_id和最后与_timescaledb_catalog.dimension_sliceon连接来将选定的块与其维度切片匹配dimension_slice_id。维度切片的选择是在使用散列值的范围内完成的。这个条件和块表的约束是一样的。例如,使用d _chunk_name

\d _timescaledb_internal._hyper_1_1_chunk
           Table "_timescaledb_internal._hyper_1_1_chunk"
 Column |           Type           | Collation | Nullable | Default
--------+--------------------------+-----------+----------+---------
 time   | timestamp with time zone |           | not null |
 device | integer                  |           |          |
 value  | double precision         |           |          |
Indexes:
    "_hyper_1_1_chunk_hyper_device_time_idx" btree (device, "time" DESC)
    "_hyper_1_1_chunk_hyper_time_idx" btree ("time" DESC)
Check constraints:
    "constraint_1" CHECK ("time" >= '2016-12-29 01:00:00+01'::timestamp with time zone AND "time" < '2017-01-05 01:00:00+01'::timestamp with time zone)
    "constraint_2" CHECK (_timescaledb_internal.get_partition_hash(device) < 1073741823)
Triggers:
    compressed_chunk_insert_blocker BEFORE INSERT ON _timescaledb_internal._hyper_1_1_chunk FOR EACH ROW EXECUTE PROCEDURE _timescaledb_internal.chunk_dml_blocker()
Inherits: hyper

以下查询演示了如何在上述查询结果中使用内部目录来获取要解压缩的确切块:

SELECT ch.chunk_name
FROM (SELECT c.chunk_name
      FROM (SELECT show_chunks('hyper') AS chunk_name
           EXCEPT (SELECT show_chunks('hyper', older_than => '2018-07-02 06:01'::timestamptz))
           EXCEPT (SELECT show_chunks('hyper', newer_than => '2018-07-02 06:01'::timestamptz))) AS c
        JOIN timescaledb_information.compressed_chunk_stats i ON i.chunk_name = c.chunk_name
      WHERE i.compression_status = 'Compressed') ch
  JOIN _timescaledb_catalog.chunk cc ON chunk_name::text = schema_name||'.'||table_name
  JOIN _timescaledb_catalog.chunk_constraint ON cc.id = chunk_id
  JOIN _timescaledb_catalog.dimension_slice ds ON dimension_slice_id = ds.id
WHERE range_start <= _timescaledb_internal.get_partition_hash(12) 
    AND range_end > _timescaledb_internal.get_partition_hash(12);

查询的结果是:

               chunk_name
-----------------------------------------
 _timescaledb_internal._hyper_3_16_chunk
(1 row)

这条语句可以变成一个函数,它将timedevice值作为输入。

回答问题的最终查询,现在只需修改查询以调用 decompress_chunk:

SELECT decompress_chunk(ch.chunk_name)
FROM (SELECT c.chunk_name
      FROM (SELECT show_chunks('hyper') AS chunk_name
           EXCEPT (SELECT show_chunks('hyper', older_than => '2018-07-02 06:01'::timestamptz))
           EXCEPT (SELECT show_chunks('hyper', newer_than => '2018-07-02 06:01'::timestamptz))) AS c
        JOIN timescaledb_information.compressed_chunk_stats i ON i.chunk_name = c.chunk_name
WHERE i.compression_status = 'Compressed') ch
JOIN _timescaledb_catalog.chunk cc ON chunk_name::text = schema_name||'.'||table_name
JOIN _timescaledb_catalog.chunk_constraint ON cc.id = chunk_id
JOIN _timescaledb_catalog.dimension_slice ds ON dimension_slice_id = ds.id
WHERE range_start <= _timescaledb_internal.get_partition_hash(12) 
    AND range_end > _timescaledb_internal.get_partition_hash(12);

并且插入将成功工作:

INSERT INTO hyper VALUES ('2018-07-02 06:01', 12, 5.1);
-- INSERT 0 1

回填用例:如果插入是回填数据的一部分,那么timescaledb-extras 项目decompress_backfill中有一个过程,它解压缩必要的块并从源表中回填数据。

请注意,回答问题的查询可能会停止在新版本的 TimescaleDB 中工作,因为它使用内部目录。

我不知道是否可以仅使用公共接口来实现相同的目的。


推荐阅读