首页 > 解决方案 > 在 ClickHouse 中填充物化视图超出内存限制

问题描述

我正在尝试使用引擎在使用ReplicatedAggregatingMergeTree引擎的表上创建物化视图ReplicatedMergeTree

经过几百万行后,我得到了DB::Exception: Memory limit (for query) exceeded. 有没有办法解决这个问题?

CREATE MATERIALIZED VIEW IF NOT EXISTS shared.aggregated_calls_1h
ENGINE = ReplicatedAggregatingMergeTree('/clickhouse/tables/{shard}/shared/aggregated_calls_1h', '{replica}')
PARTITION BY toRelativeDayNum(retained_until_date)
ORDER BY (
  client_id,
  t,
  is_synthetic,
    source_application_ids,
    source_service_id,
    source_endpoint_id,
    destination_application_ids,
    destination_service_id,
    destination_endpoint_id,
    boundary_application_ids,
    process_snapshot_id,
    docker_snapshot_id,
    host_snapshot_id,
    cluster_snapshot_id,
    http_status
)
SETTINGS index_granularity = 8192
POPULATE
AS
SELECT
  client_id,
  toUInt64(floor(t / (60000 * 60)) * (60000 *60)) AS t,
  date,
  toDate(retained_until_timestamp / 1000) retained_until_date,
  is_synthetic,
    source_application_ids,
    source_service_id,
    source_endpoint_id,
  destination_application_ids,
    destination_service_id,
  destination_endpoint_id,
    boundary_application_ids,
    http_status,
    process_snapshot_id,
    docker_snapshot_id,
    host_snapshot_id,
    cluster_snapshot_id,
  any(destination_endpoint) AS destination_endpoint,
  any(destination_endpoint_type) AS destination_endpoint_type,
  groupUniqArrayArrayState(destination_technologies) AS destination_technologies_state,
  minState(ingestion_time) AS min_ingestion_time_state,
  sumState(batchCount) AS sum_call_count_state,
  sumState(errorCount) AS sum_error_count_state,
  sumState(duration) AS sum_duration_state,
  minState(toUInt64(ceil(duration/batchCount))) AS min_duration_state,
  maxState(toUInt64(ceil(duration/batchCount))) AS max_duration_state,
    quantileTimingWeightedState(0.25)(toUInt64(ceil(duration/batchCount)), batchCount) AS latency_p25_state,
    quantileTimingWeightedState(0.50)(toUInt64(ceil(duration/batchCount)), batchCount) AS latency_p50_state,
    quantileTimingWeightedState(0.75)(toUInt64(ceil(duration/batchCount)), batchCount) AS latency_p75_state,
    quantileTimingWeightedState(0.90)(toUInt64(ceil(duration/batchCount)), batchCount) AS latency_p90_state,
    quantileTimingWeightedState(0.95)(toUInt64(ceil(duration/batchCount)), batchCount) AS latency_p95_state,
    quantileTimingWeightedState(0.98)(toUInt64(ceil(duration/batchCount)), batchCount) AS latency_p98_state,
    quantileTimingWeightedState(0.99)(toUInt64(ceil(duration/batchCount)), batchCount) AS latency_p99_state,
    quantileTimingWeightedState(0.25)(toUInt64(ceil(duration/batchCount)/100), batchCount) AS latency_p25_large_state,
    quantileTimingWeightedState(0.50)(toUInt64(ceil(duration/batchCount)/100), batchCount) AS latency_p50_large_state,
    quantileTimingWeightedState(0.75)(toUInt64(ceil(duration/batchCount)/100), batchCount) AS latency_p75_large_state,
    quantileTimingWeightedState(0.90)(toUInt64(ceil(duration/batchCount)/100), batchCount) AS latency_p90_large_state,
    quantileTimingWeightedState(0.95)(toUInt64(ceil(duration/batchCount)/100), batchCount) AS latency_p95_large_state,
    quantileTimingWeightedState(0.98)(toUInt64(ceil(duration/batchCount)/100), batchCount) AS latency_p98_large_state,
    quantileTimingWeightedState(0.99)(toUInt64(ceil(duration/batchCount)/100), batchCount) AS latency_p99_large_state,
    sumState(minSelfTime) AS sum_min_self_time_state
FROM shared.calls_v2
WHERE sample_type != 'user_selected'
GROUP BY
  client_id,
    t,
    date,
    retained_until_date,
    is_synthetic,
    source_application_ids,
    source_service_id,
    source_endpoint_id,
    destination_application_ids,
    destination_service_id,
    destination_endpoint_id,
    boundary_application_ids,
    process_snapshot_id,
    docker_snapshot_id,
    host_snapshot_id,
    cluster_snapshot_id,
    http_status
HAVING destination_endpoint_type != 'INTERNAL'

标签: clickhouse

解决方案


您可以尝试使用--max_memory_usage选项clickhouse-client来增加限制。

--max_memory_usage arg “处理单个查询的最大内存使用量。零表示无限制。”

https://clickhouse.yandex/docs/en/operations/settings/query_complexity/#settings_max_memory_usage

或者不是填充,而是手动将数据复制到表中

INSERT INTO .inner.shared.aggregated_calls_1h
SELECT 
  client_id,
  toUInt64(floor(t / (60000 * 60)) * (60000 *60)) AS t,
  ...

推荐阅读