postgresql - PostgreSQL:从两个表中选择计数和最大值
问题描述
我有两个表由一个公共 ID 列链接,如下所示:
CREATE TABLE IF NOT EXISTS names (
uid BIGSERIAL,
name VARCHAR(255) NOT NULL,
PRIMARY KEY (uid)
);
CREATE TABLE IF NOT EXISTS texts (
name_uid BIGINT NOT NULL REFERENCES names,
timestamp TIMESTAMP NOT NULL,
some_value TEXT NULL
);
在这里,我们有一些数据可以玩:
INSERT INTO names VALUES ( 0, '1/a' );
INSERT INTO names VALUES ( 1, '1/b' );
INSERT INTO names VALUES ( 2, '2/c' );
INSERT INTO names VALUES ( 3, '3/d' );
INSERT INTO names VALUES ( 4, '3/e' );
INSERT INTO names VALUES ( 5, '3/f' );
INSERT INTO texts VALUES ( 0, '2018-01-01 00:00:00', 'text...' );
INSERT INTO texts VALUES ( 1, '2018-01-02 00:00:00', 'text...' );
INSERT INTO texts VALUES ( 2, '2018-02-01 00:00:00', 'text...' );
INSERT INTO texts VALUES ( 2, '2018-02-02 00:00:00', 'text...' );
INSERT INTO texts VALUES ( 3, '2018-03-01 00:00:00', 'text...' );
INSERT INTO texts VALUES ( 3, '2018-06-01 00:00:00', 'text...' );
INSERT INTO texts VALUES ( 4, '2018-06-02 00:00:00', 'text...' );
INSERT INTO texts VALUES ( 5, '2018-06-03 00:00:00', 'text...' );
我现在需要的是应用以下逻辑规则
- 根据表名中列名上的 SIMILAR TO 模式选择 names.uid 和 names.name,并按前缀对它们进行分组
- 对于名称中的选定行:从文本中获取最新的时间戳条目(无论何时)
- 对于名称中选定的行:计算表格文本中特定日期之后具有特定名称前缀的对应行
这可以通过以下查询来实现:
SELECT substring(names.name, '[^/]+' ) AS name_prefix, COALESCE( sum( text_counts.count ), 0) AS counter, max(text_timestamps.timestamp) AS timestamp
FROM names
LEFT JOIN (
SELECT texts.name_uid, count(*)
FROM texts
WHERE texts.timestamp > '2018-05-01 00:00:00'
GROUP BY texts.name_uid
) text_counts ON text_counts.name_uid = names.uid
LEFT JOIN(
SELECT texts.name_uid, max(texts.timestamp) AS timestamp
FROM texts
GROUP BY texts.name_uid
) text_timestamps ON text_timestamps.name_uid = names.uid
WHERE names.name SIMILAR TO '1%|3%'
GROUP BY name_prefix
但是,这个查询很慢。所以我试图想出一个更好的解决方案,但到目前为止失败了。我得到的是这样的:
SELECT name_info.name_prefix, count(*) AS counter, max(timestamp) AS timestamp
FROM texts
RIGHT JOIN (
SELECT names.uid, substring(names.name, '[^/]+' ) AS name_prefix
FROM names
WHERE names.name SIMILAR TO '1%|3%'
) name_info ON name_info.uid = texts.name_uid
WHERE texts.timestamp > '2018-05-01 00:00:00'
GROUP BY name_info.name_prefix
与拳头解决方案相比,这是非常快的。问题是,现在结果中缺少计数为零的行。
我现在的问题是如何制作一个提供接近查询 2 2 的性能但在结果中包含计数为零的行的查询
一些上下文信息:我正在使用 PostgreSQL 10,表文本的行数比表名多一百万倍。事实上,文本在现实世界中甚至是分区的,但我决定在此处的示例中跳过这个。
解决方案
WHERE
由于子句中的时间戳条件,第二个查询中的右连接就像一个内连接。删除条件并将count(*)
聚合与 一起使用FILTER
:
SELECT
name_info.name_prefix,
count(*) FILTER (WHERE texts.timestamp > '2018-05-01 00:00:00') AS counter,
max(timestamp) AS timestamp
FROM texts
RIGHT JOIN (
SELECT names.uid, substring(names.name, '[^/]+' ) AS name_prefix
FROM names
WHERE names.name SIMILAR TO '1%|3%'
) name_info ON name_info.uid = texts.name_uid
GROUP BY name_info.name_prefix;
您也可以尝试两阶段分组,例如:
select
name_prefix,
sum(counter) as counter,
max(timestamp) as timestamp
from (
select
substring(name, '[^/]+' ) as name_prefix,
sum((timestamp > '2018-05-01 00:00:00')::int) as counter,
max(timestamp) as timestamp
from texts
join names on name_uid = uid
where name similar to '1%|3%'
group by uid
) s
group by name_prefix
推荐阅读
- vue.js - Bootstrap Vue onerror event in image
- dart - How to add a widget when a button is pressed flutter
- vue.js - 如何在不构建在 webpack 中的情况下导入外部文件?
- karate - all key-values did not match for jsonschema
- python - from a set of x items, repeat each item y times such that, y follows a normal distribution
- r - 我试图运行一个模拟来确定 16 个标准正态变量之和的标准偏差是多少
- c# - 新建 C# 项目无法打开 ACCDB microsoft 数据库文件
- c++ - 警告 C4481 在限定名称中使用的枚举“MyEnum”中使用了非标准扩展
- node.js - 错误:在 Node JS 中使用 request-promise 发出的 Http 请求中的套接字挂起,导致 for 循环重新启动
- java - 不打印 ArrayList 但索引正确?