首页 > 解决方案 > 过滤电子邮件和姓名,然后在 PostgreSQL 12 上使用 JSON 在两列中重复数据删除

问题描述

我有emails一个有列sender的表。reporter我想在这些列中搜索给定参数并返回唯一值。

让我用样本来解释。这是我的表格和记录:

CREATE TABLE public.emails (
  id                bigint NOT NULL PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY
    (MAXVALUE 9223372036854775807),
  sender            jsonb NOT NULL,
  reporter      jsonb not null
);

insert into emails (sender, reporter) VALUES ('[{"email": "dennis1@example.com", "name": "dennis1"}]', '[]');
insert into emails (sender, reporter) VALUES ('[{"email": "dennis2@example.com", "name": "dennis1"}]', '[{"email": "john@example.com", "name": "john"}, {"email": "dennis1@example.com", "name": "dennis1"}, {"email": "dennis2@example.com", "name": "dennis2"}]');
insert into emails (sender, reporter) VALUES ('[{"email": "dennis1@example.com", "name": "dennis1"}]', '[]');
insert into emails (sender, reporter) VALUES ('[{"email": "dennis1@example.com", "name": "dennis1"}]', '[]');

我想获得电子邮件地址和姓名。我也想避免上当。只有一封电子邮件和一个名字。我也不想将其作为数组获取,而是每行一封电子邮件和姓名。

搜索john

SELECT
* /* i don't know what to put here pr merge with reporters */
FROM "emails" AS "e"
WHERE (EXISTS (SELECT
    *
  FROM JSONB_ARRAY_ELEMENTS_TEXT("e"."sender") AS "e" ("email")
  WHERE ("e"."email" ~* 'john' or "e"."name" ~* 'john'))
);

预期结果john

email                 name
john@example.com      john

搜索 ``(空):

SELECT
* /* i don't know what to put here pr merge with reporters */
FROM "emails" AS "e"
WHERE (EXISTS (SELECT
    *
  FROM JSONB_ARRAY_ELEMENTS_TEXT("e"."sender") AS "e" ("email")
  WHERE ("e"."email" ~* '' or "e"."name" ~* ''))
);

``(空)的预期结果:

email                 name
john@example.com      john
dennis1@example.com   dennis1
dennis2@example.com   dennis2

dennis2两者都存在senderreporter因此只需要其中之一。没有骗子。

事实上,这里有一个问题。如果senderorreporter列至少有一个 json 对象(不是 json 数组),那么这个查询也会失败。

错误:cannot extract elements from an object

这是另一个故事,寿。

在这种情况下,我怎样才能实现我的目标?

演示:https ://dbfiddle.uk/?rdbms=postgres_12&fiddle=1bf9c5f83f5104e2392c31984cb4e939

标签: sqlpostgresqlpostgresql-12

解决方案


在搜索之前规范化您的数据,然后使用distinct on ()子句删除重复项:

with cte as (select x ->> 'name' as name, x ->> 'email' as email
from emails as e, jsonb_array_elements(e.sender || e.reporter) as x)
select distinct on (email) * from cte where 
  name ~* '' or email ~* ''
  --name ~* 'john' or email ~* 'john'
order by email;

Demo

请注意,它将始终扫描整个表,在这种情况下没有适用的索引。考虑模式规范化


推荐阅读