sqlite - 加快 SQLite 中的 COUNT DISTINCT 查询
问题描述
更新:添加背景信息和更多解释
给定下表中的约 30 万条记录
CREATE TABLE t (
id INTEGER PRIMARY KEY,
a TEXT,
b TEXT,
c TEXT,
deleted INTEGER DEFAULT 0
);
CREATE INDEX ix_t ON t (deleted) WHERE deleted = 0;
1) SELECT Count(id) FROM t WHERE deleted = 0;
2) SELECT Count(DISTINCT a) FROM t WHERE deleted = 0;
3) SELECT Count(DISTINCT b) FROM t WHERE deleted = 0;
4) SELECT Count(DISTINCT c) FROM t WHERE deleted = 0;
查询 1 需要 4-5 毫秒。其他三个查询COUNT (DISTINCT <col>)
需要 600-900 毫秒。如何将这些查询加快到与第一个查询相同的顺序?我创建了以下索引,但没有帮助
CREATE INDEX ix_t_a ON t (a, deleted) WHERE deleted = 0;
-- (and so on the columns b and c as well)
EXPLAIN QUERY PLAN SELECT Count(DISTINCT a) FROM t WHERE deleted = 0;
-- output: SEARCH TABLE t USING INDEX ix_t (deleted=?)
正如我们在上面看到的,新的索引没有被使用。
该deleted
列是一个标志,用于跟踪要从所有查询中排除的行。在大多数情况下,这样的行将很少,但重要的是它们不会在计数和选择中使用。
最终行数可能会增长到原来的 3 倍,比如说 1M。即使那样,deleted
行数也将是最少的。
计数是COUNT (DISTINCT column)
创建方面。假设有人搜索所有行WHERE a = 'foo'
。我需要返回匹配的行,我还需要返回这些行中有多少DISTINCT
b
和c
存在。所以,我会做类似的事情
-- number of rows in the result set
SELECT COUNT(id) FROM t WHERE deleted = 0 AND a = 'foo';
-- the actual result set
SELECT * FROM t WHERE deleted = 0 AND a = 'foo' LIMIT 30 OFFSET 0;
-- facet counts
SELECT COUNT (DISTINCT b) FROM t WHERE deleted = 0 AND a = 'foo';
SELECT COUNT (DISTINCT c) FROM t WHERE deleted = 0 AND a = 'foo';
在第一个实例中,由于没有WHERE
子句,上述查询将是
-- number of rows in the result set
SELECT COUNT(id) FROM t WHERE deleted = 0;
-- the actual result set
SELECT * FROM t WHERE deleted = 0 LIMIT 30 OFFSET 0;
-- facet counts
SELECT COUNT (DISTINCT a) FROM t WHERE deleted = 0;
SELECT COUNT (DISTINCT b) FROM t WHERE deleted = 0;
SELECT COUNT (DISTINCT c) FROM t WHERE deleted = 0;
表中构成方面的实际列大约是 8 或 9。所以,我必须做大约 8 或 9 SELECT COUNT (DISTINCT col)
。每个大约需要 600-900 毫秒,也就是将近 6-10 秒。从用户的角度来看,查询太慢了。将选择减少一半或 ⅔ 会产生巨大的影响。
实际上,我也有一个查询缓存,所以任何一次查询的结果都会被缓存,第二次会很快,只要是完全相同的查询。在这里,通过查询,我的意思是来自用户的 pov 的查询。当然,来自用户 pov 的每个查询都会导致 9-10 个数据库查询。然而,速度对于创建高性能应用程序非常重要。
解决方案
如何将这些查询加快到与第一个查询相同的顺序?
第1部分
我不确定您是否可以做到这一点,如果可以,这很可能取决于数据本身,例如,如果分析做了一些惊人的事情。
DISTINCT 有相当大的影响。所以这可能是造成差异的唯一原因
如果您只对没有 DISTINCT 的第一个查询使用 EXPLAIN,则结果是:-
对于其他查询,EXPLAIN 结果是:-
附加的(而不是突出不同的操作,即还有更多工作要做)
您可能希望查看SQLite 字节码引擎 - 操作码
如何将这些查询加快到与第一个查询相同的顺序?
第2部分
也许考虑修改以下曾经有一点作用的代码,并得出结论 ANALYZE 可能不会改善问题,也没有(一点点)玩弄索引似乎有很大的不同:-
DROP TABLE IF EXISTS t;
CREATE TABLE t (id INTEGER PRIMARY KEY, a TEXT, b TEXT, c TEXT, deleted INTEGER DEFAULT 0);
WITH
alphabet(letters) AS (SELECT 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'),
cte1(counter,deleted) AS (SELECT 1, abs(random()) % 2 UNION ALL SELECT counter+1, abs(random()) % 2 FROM cte1 LIMIT 300000)
INSERT INTO t (deleted,a,b,c)
SELECT deleted,
substr((SELECT letters FROM alphabet),(abs(random()) % ((SELECT max(length(letters)) FROM alphabet)-10)) + 1,(abs(random()) % 10) + 1),
substr((SELECT letters FROM alphabet),(abs(random()) % ((SELECT max(length(letters)) FROM alphabet)-10)) + 1,(abs(random()) % 10) + 1),
substr((SELECT letters FROM alphabet),(abs(random()) % ((SELECT max(length(letters)) FROM alphabet)-10)) + 1,(abs(random()) % 10) + 1)
FROM cte1;
SELECT * FROM t; /* COMMENT/UNCOMMENT WITH/WITHOUT -- To HIDE/SHOW TABLE */
CREATE INDEX ix_t ON t (deleted) WHERE deleted = 0;
-- EXPLAIN /* UNCOMMENT BY REMOVING -- ON THIS LINE TO JUST DO EXPLAIN */
-- QUERY PLAN /* TO DO EXPLAIN QUERY PLAN UNCOMMNET LINE ABOVE AS ABOVE and UNCOMMENT THIS SAME WAY */
SELECT Count(id) FROM t WHERE deleted = 0;
-- EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT a) FROM t WHERE deleted = 0;
-- EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT b) FROM t WHERE deleted = 0;
-- EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT c) FROM t WHERE deleted = 0;
ANALYZE;
SELECT Count(id) FROM t WHERE deleted = 0;
-- EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT a) FROM t WHERE deleted = 0;
-- EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT b) FROM t WHERE deleted = 0;
-- EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT c) FROM t WHERE deleted = 0;
/*<<<<<<<<<< ATTEMPT 2 >>>>>>>>>>*/
DROP TABLE IF EXISTS t;
CREATE TABLE t (id INTEGER PRIMARY KEY, a TEXT, b TEXT, c TEXT, deleted INTEGER DEFAULT 0);
WITH
alphabet(letters) AS (SELECT 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'),
cte1(counter,deleted) AS (SELECT 1, abs(random()) % 2 UNION ALL SELECT counter+1, abs(random()) % 2 FROM cte1 LIMIT 300000)
INSERT INTO t (deleted,a,b,c)
SELECT deleted,
substr((SELECT letters FROM alphabet),(abs(random()) % ((SELECT max(length(letters)) FROM alphabet)-10)) + 1,(abs(random()) % 10) + 1),
substr((SELECT letters FROM alphabet),(abs(random()) % ((SELECT max(length(letters)) FROM alphabet)-10)) + 1,(abs(random()) % 10) + 1),
substr((SELECT letters FROM alphabet),(abs(random()) % ((SELECT max(length(letters)) FROM alphabet)-10)) + 1,(abs(random()) % 10) + 1)
FROM cte1;
-- SELECT * FROM t;
CREATE INDEX ix_t ON t (deleted,a,b,c) WHERE deleted = 0;
-- EXPLAIN
-- QUERY PLAN
SELECT Count(id) FROM t WHERE deleted = 0;
-- EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT a) FROM t WHERE deleted = 0;
-- EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT b) FROM t WHERE deleted = 0;
-- EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT c) FROM t WHERE deleted = 0;
ANALYZE;
SELECT Count(id) FROM t WHERE deleted = 0;
-- EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT a) FROM t WHERE deleted = 0;
-- EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT b) FROM t WHERE deleted = 0;
-- EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT c) FROM t WHERE deleted = 0;
这在运行时(假设使用的工具(在 Navicat 中可以)允许一系列查询)。将要 :-
- 删除并创建表 t
- 用随机数据填充 t(rowid 除外)
- 创建索引(参见代码,因为第二部分不同)
- 运行 4 个查询(请参阅打开或关闭解释或解释查询计划的注释)
- 进行分析。
- 在分析后运行相同的 4 个查询。
- 重复 1-6(使用不同的索引)。
示例日志:-
DROP TABLE IF EXISTS t
> OK
> Time: 1.229s
CREATE TABLE t (id INTEGER PRIMARY KEY, a TEXT, b TEXT, c TEXT, deleted INTEGER DEFAULT 0)
> OK
> Time: 0.132s
WITH
alphabet(letters) AS (SELECT 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'),
cte1(counter,deleted) AS (SELECT 1, abs(random()) % 2 UNION ALL SELECT counter+1, abs(random()) % 2 FROM cte1 LIMIT 300000)
INSERT INTO t (deleted,a,b,c)
SELECT deleted,
substr((SELECT letters FROM alphabet),(abs(random()) % ((SELECT max(length(letters)) FROM alphabet)-10)) + 1,(abs(random()) % 10) + 1),
substr((SELECT letters FROM alphabet),(abs(random()) % ((SELECT max(length(letters)) FROM alphabet)-10)) + 1,(abs(random()) % 10) + 1),
substr((SELECT letters FROM alphabet),(abs(random()) % ((SELECT max(length(letters)) FROM alphabet)-10)) + 1,(abs(random()) % 10) + 1)
FROM cte1
> Affected rows: 300000
> Time: 0.75s
-- SELECT * FROM t;
CREATE INDEX ix_t ON t (deleted) WHERE deleted = 0
> OK
> Time: 0.195s
EXPLAIN
-- QUERY PLAN
SELECT Count(id) FROM t WHERE deleted = 0
> OK
> Time: 0s
EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT a) FROM t WHERE deleted = 0
> OK
> Time: 0s
EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT b) FROM t WHERE deleted = 0
> OK
> Time: 0s
EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT c) FROM t WHERE deleted = 0
> OK
> Time: 0s
ANALYZE
> OK
> Time: 0.137s
SELECT Count(id) FROM t WHERE deleted = 0
> OK
> Time: 0.031s
-- EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT a) FROM t WHERE deleted = 0
> OK
> Time: 0.057s
-- EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT b) FROM t WHERE deleted = 0
> OK
> Time: 0.054s
-- EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT c) FROM t WHERE deleted = 0
> OK
> Time: 0.055s
/*<<<<<<<<<< ATTEMPT 2 >>>>>>>>>>*/
DROP TABLE IF EXISTS t
> OK
> Time: 0.891s
CREATE TABLE t (id INTEGER PRIMARY KEY, a TEXT, b TEXT, c TEXT, deleted INTEGER DEFAULT 0)
> OK
> Time: 0.153s
WITH
alphabet(letters) AS (SELECT 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'),
cte1(counter,deleted) AS (SELECT 1, abs(random()) % 2 UNION ALL SELECT counter+1, abs(random()) % 2 FROM cte1 LIMIT 300000)
INSERT INTO t (deleted,a,b,c)
SELECT deleted,
substr((SELECT letters FROM alphabet),(abs(random()) % ((SELECT max(length(letters)) FROM alphabet)-10)) + 1,(abs(random()) % 10) + 1),
substr((SELECT letters FROM alphabet),(abs(random()) % ((SELECT max(length(letters)) FROM alphabet)-10)) + 1,(abs(random()) % 10) + 1),
substr((SELECT letters FROM alphabet),(abs(random()) % ((SELECT max(length(letters)) FROM alphabet)-10)) + 1,(abs(random()) % 10) + 1)
FROM cte1
> Affected rows: 300000
> Time: 0.643s
-- SELECT * FROM t;
CREATE INDEX ix_t ON t (deleted,a,b,c) WHERE deleted = 0
> OK
> Time: 0.583s
-- EXPLAIN
-- QUERY PLAN
SELECT Count(id) FROM t WHERE deleted = 0
> OK
> Time: 0.029s
-- EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT a) FROM t WHERE deleted = 0
> OK
> Time: 0.041s
-- EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT b) FROM t WHERE deleted = 0
> OK
> Time: 0.029s
-- EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT c) FROM t WHERE deleted = 0
> OK
> Time: 0.031s
ANALYZE
> OK
> Time: 0.121s
SELECT Count(id) FROM t WHERE deleted = 0
> OK
> Time: 0.038s
-- EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT a) FROM t WHERE deleted = 0
> OK
> Time: 0.046s
-- EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT b) FROM t WHERE deleted = 0
> OK
> Time: 0.029s
-- EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT c) FROM t WHERE deleted = 0
> OK
> Time: 0.031s
第 3 部分
我没有做太多的时间安排,但也许考虑一下(使用仅删除 = 0 行的诱惑表):-
DROP TABLE IF EXISTS t;
CREATE TABLE t (id INTEGER PRIMARY KEY, a TEXT, b TEXT, c TEXT, deleted INTEGER DEFAULT 0);
WITH
alphabet(letters) AS (SELECT 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'),
cte1(counter,deleted) AS (SELECT 1, abs(random()) % 2 UNION ALL SELECT counter+1, abs(random()) % 2 FROM cte1 LIMIT 300000)
INSERT INTO t (deleted,a,b,c)
SELECT deleted,
substr((SELECT letters FROM alphabet),(abs(random()) % ((SELECT max(length(letters)) FROM alphabet)-10)) + 1,(abs(random()) % 10) + 1),
substr((SELECT letters FROM alphabet),(abs(random()) % ((SELECT max(length(letters)) FROM alphabet)-10)) + 1,(abs(random()) % 10) + 1),
substr((SELECT letters FROM alphabet),(abs(random()) % ((SELECT max(length(letters)) FROM alphabet)-10)) + 1,(abs(random()) % 10) + 1)
FROM cte1;
-- SELECT * FROM t; /* COMMENT/UNCOMMENT WITH/WITHOUT -- To HIDE/SHOW TABLE */
CREATE TEMP TABLE trimmedt AS SELECT * FROM t WHERE deleted = 0;
-- EXPLAIN /* UNCOMMENT BY REMOVING -- ON THIS LINE TO JUST DO EXPLAIN */
-- QUERY PLAN /* TO DO EXPLAIN QUERY PLAN UNCOMMNET LINE ABOVE AS ABOVE and UNCOMMENT THIS SAME WAY */
SELECT Count(id) FROM trimmedt;
-- EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT a) FROM trimmedt;
-- EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT b) FROM trimmedt;
-- EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT c) FROM trimmedt;
消息日志:-
DROP TABLE IF EXISTS t
> OK
> Time: 1.242s
CREATE TABLE t (id INTEGER PRIMARY KEY, a TEXT, b TEXT, c TEXT, deleted INTEGER DEFAULT 0)
> OK
> Time: 0.103s
WITH
alphabet(letters) AS (SELECT 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'),
cte1(counter,deleted) AS (SELECT 1, abs(random()) % 2 UNION ALL SELECT counter+1, abs(random()) % 2 FROM cte1 LIMIT 300000)
INSERT INTO t (deleted,a,b,c)
SELECT deleted,
substr((SELECT letters FROM alphabet),(abs(random()) % ((SELECT max(length(letters)) FROM alphabet)-10)) + 1,(abs(random()) % 10) + 1),
substr((SELECT letters FROM alphabet),(abs(random()) % ((SELECT max(length(letters)) FROM alphabet)-10)) + 1,(abs(random()) % 10) + 1),
substr((SELECT letters FROM alphabet),(abs(random()) % ((SELECT max(length(letters)) FROM alphabet)-10)) + 1,(abs(random()) % 10) + 1)
FROM cte1
> Affected rows: 300000
> Time: 0.722s
-- SELECT * FROM t; /* COMMENT/UNCOMMENT WITH/WITHOUT -- To HIDE/SHOW TABLE */
CREATE TEMP TABLE trimmedt AS SELECT * FROM t WHERE deleted = 0
> OK
> Time: 0.091s
-- EXPLAIN /* UNCOMMENT BY REMOVING -- ON THIS LINE TO JUST DO EXPLAIN */
-- QUERY PLAN /* TO DO EXPLAIN QUERY PLAN UNCOMMNET LINE ABOVE AS ABOVE and UNCOMMENT THIS SAME WAY */
SELECT Count(id) FROM trimmedt
> OK
> Time: 0.009s
-- EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT a) FROM trimmedt
> OK
> Time: 0.03s
-- EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT b) FROM trimmedt
> OK
> Time: 0.03s
-- EXPLAIN
-- QUERY PLAN
SELECT Count(DISTINCT c) FROM trimmedt
> OK
> Time: 0.031s
推荐阅读
- python - 如何在 Linux 中获得空闲时间?
- laravel - 如何在laravel 7中从http重定向到https并删除带有子域的www
- kubernetes - Prometheus 速率函数输出为整数
- python - How to store and use data from a text file in python for sentiment-analysis?
- r - compute mean with x range for many variables
- python - How to prevent user input into console when program is running in python?
- java - Google Dataproc API (through Java) does not submit Job to cluster
- excel - VBA: Exporting charts from Excel to PowerPoint, but my PowerPoint Application crashes without an error message
- ios - 当我添加 Google 移动广告 SDK 时,Flutter ios pod install 出现此错误。关于如何解决的任何想法?
- axios - NestJS FilesInterceptor 不解析来自 Axios 请求的文件