首页 > 解决方案 > Clickhouse:值为 0 1 的列上的索引

问题描述

我正在尝试提高在 UInt8 列上包含 WHERE 子句的查询的性能,该子句仅包含 0 或 1 作为可能的值。我试图分解问题以确保没有其他因素(分区、PK..)导致问题。我创建了一个简单的表index_text,只有 1 列和一组像这样的索引:

CREATE TABLE default.index_text (
  `columnX` UInt8,
  INDEX indexX1 columnX TYPE minmax GRANULARITY 1,
  INDEX indexX2 columnX TYPE
  set(0) GRANULARITY 1,
    INDEX indexX3 columnX TYPE
  set(1) GRANULARITY 1
) ENGINE = MergeTree()
ORDER BY
  tuple() SETTINGS index_granularity = 8192

之后,我用大约 250 万个随机值(0 或 1)填充表。我希望 indizes 在此查询中删除颗粒,但事实并非如此:

SELECT COUNT(*) FROM index_text WHERE columnX = 0

SELECT COUNT(*)
FROM index_text
WHERE columnX = 0

[JWDebian] 2020.10.19 07:48:26.511085 [ 584 ] {af7615f0-f32b-47c5-87a2-e8acc8e27f5e} <Debug> executeQuery: (from [::1]:40088) SELECT COUNT(*) FROM index_text WHERE columnX = 0
[JWDebian] 2020.10.19 07:48:26.511384 [ 584 ] {af7615f0-f32b-47c5-87a2-e8acc8e27f5e} <Trace> ContextAccess (default): Access granted: SELECT(columnX) ON default.index_text
[JWDebian] 2020.10.19 07:48:26.511440 [ 584 ] {af7615f0-f32b-47c5-87a2-e8acc8e27f5e} <Debug> default.index_text (SelectExecutor): Key condition: unknown
[JWDebian] 2020.10.19 07:48:26.512611 [ 584 ] {af7615f0-f32b-47c5-87a2-e8acc8e27f5e} <Debug> default.index_text (SelectExecutor): Index `indexX1` has dropped 0 / 3050 granules.
[JWDebian] 2020.10.19 07:48:26.522601 [ 584 ] {af7615f0-f32b-47c5-87a2-e8acc8e27f5e} <Debug> default.index_text (SelectExecutor): Index `indexX2` has dropped 0 / 3050 granules.
[JWDebian] 2020.10.19 07:48:26.523699 [ 584 ] {af7615f0-f32b-47c5-87a2-e8acc8e27f5e} <Debug> default.index_text (SelectExecutor): Index `indexX3` has dropped 0 / 3050 granules.
[JWDebian] 2020.10.19 07:48:26.523722 [ 584 ] {af7615f0-f32b-47c5-87a2-e8acc8e27f5e} <Debug> default.index_text (SelectExecutor): Selected 1 parts by date, 1 parts by key, 3050 marks by primary key, 3050 marks to read from 1 ranges
[JWDebian] 2020.10.19 07:48:26.523764 [ 584 ] {af7615f0-f32b-47c5-87a2-e8acc8e27f5e} <Trace> default.index_text (SelectExecutor): Reading approx. 24985600 rows with 2 streams
[JWDebian] 2020.10.19 07:48:26.523823 [ 584 ] {af7615f0-f32b-47c5-87a2-e8acc8e27f5e} <Trace> InterpreterSelectQuery: FetchColumns -> Complete
[JWDebian] 2020.10.19 07:48:26.525061 [ 620 ] {af7615f0-f32b-47c5-87a2-e8acc8e27f5e} <Trace> AggregatingTransform: Aggregating
[JWDebian] 2020.10.19 07:48:26.525087 [ 620 ] {af7615f0-f32b-47c5-87a2-e8acc8e27f5e} <Trace> Aggregator: Aggregation method: without_key
[JWDebian] 2020.10.19 07:48:26.530850 [ 621 ] {af7615f0-f32b-47c5-87a2-e8acc8e27f5e} <Trace> AggregatingTransform: Aggregating
[JWDebian] 2020.10.19 07:48:26.530893 [ 621 ] {af7615f0-f32b-47c5-87a2-e8acc8e27f5e} <Trace> Aggregator: Aggregation method: without_key
[JWDebian] 2020.10.19 07:48:26.598438 [ 620 ] {af7615f0-f32b-47c5-87a2-e8acc8e27f5e} <Trace> AggregatingTransform: Aggregated. 6509826 to 1 rows (from 6.21 MiB) in 0.074525217 sec. (87350648.03635526 rows/sec., 83.30 MiB/sec.)
[JWDebian] 2020.10.19 07:48:26.598976 [ 621 ] {af7615f0-f32b-47c5-87a2-e8acc8e27f5e} <Trace> AggregatingTransform: Aggregated. 6109074 to 1 rows (from 5.83 MiB) in 0.075064427 sec. (81384408.62274216 rows/sec., 77.61 MiB/sec.)
[JWDebian] 2020.10.19 07:48:26.598994 [ 621 ] {af7615f0-f32b-47c5-87a2-e8acc8e27f5e} <Trace> Aggregator: Merging aggregated data
┌──COUNT()─┐
│ 12618900 │
└──────────┘
[JWDebian] 2020.10.19 07:48:26.599322 [ 584 ] {af7615f0-f32b-47c5-87a2-e8acc8e27f5e} <Information> executeQuery: Read 24979658 rows, 23.82 MiB in 0.088181578 sec., 283275243 rows/sec., 270.15 MiB/sec.
[JWDebian] 2020.10.19 07:48:26.599356 [ 584 ] {af7615f0-f32b-47c5-87a2-e8acc8e27f5e} <Debug> MemoryTracker: Peak memory usage (for query): 0.00 B.

我在这里做错了什么?对INDEX的概念误解?INDEX 的类型/参数错误?我正在使用 ClickHouse 服务器版本 20.9.2 修订版 54439,所以我猜allow_experimental_data_skipping_indices设置不再重要。无奈之下,我将其设置为1并在填充后查询了一个O PTIMIZE TABLE index_text FINAL,但结果是一样的。

标签: performanceindexingdatabase-performanceclickhouse

解决方案


推荐阅读