首页 > 解决方案 > 优化与 lpath 相关的查询的问题

问题描述

我正在开发的应用程序有一个名为folders. 文件夹可能包含其他文件夹,归用户所有,并且可以与其他用户共享。

该功能已经实现并且一直运行良好,直到最近某个特定的 API 端点开始表现不佳。此 API 端点返回调用用户拥有的所有文件夹。此外,对于每个文件夹,将返回有权访问该文件夹的用户列表(拥有用户将是其中之一)。

文件夹表的架构如下所示:

id
title
parent_id
path (of ltree type)

包含共享位的表如下所示,folder_visibility 表:

id
folder_id
profile_id

给定 profile_id 对文件夹的访问由 2 个规则决定:

  1. (folder_id, profile_id)folder_visibility 表中存在一条记录,
  2. 一个文件夹是另一个文件夹的后代,其中存在(folder_id, profile_id)记录。

或者,第一条和第二条规则可以这样表达:如果文件夹与用户共享,则用户可以看到该文件夹​​及其所有后代。

问题是有效地实施规则 2 对我来说似乎很难。

例子。以下是我的思考过程,以及测试数据:

create table folders (id uuid, title text, path ltree);
create table folder_visibility (id uuid, profile_id uuid, folder_id uuid);
insert into folders (id, title, path) values ('9ca5ed28-83c3-45c3-98a1-31ebf0466e63', 'Folder 1', '9ca5ed28_83c3_45c3_98a1_31ebf0466e63');
insert into folders (id, title, path) values ('80ba97ae-b21b-49d7-aedb-3004cdee0c21', 'Folder 2', '9ca5ed28_83c3_45c3_98a1_31ebf0466e63.80ba97ae_b21b_49d7_aedb_3004cdee0c21');
insert into folders (id, title, path) values ('a7abb686-2a69-44d1-b2cd-60825478c4c8', 'Folder 3', '9ca5ed28_83c3_45c3_98a1_31ebf0466e63.80ba97ae_b21b_49d7_aedb_3004cdee0c21.a7abb686_2a69_44d1_b2cd_60825478c4c8');
insert into folder_visibility (id, profile_id, folder_id) values (gen_random_uuid(), 'e63533e4-eb10-48f6-ae69-ddf5f9242510', '80ba97ae-b21b-49d7-aedb-3004cdee0c21');
insert into folder_visibility (id, profile_id, folder_id) values (gen_random_uuid(), '7cbb5f24-61d0-430e-ab10-225554b0725b', '9ca5ed28-83c3-45c3-98a1-31ebf0466e63');
insert into profiles values ('7cbb5f24-61d0-430e-ab10-225554b0725b', 'owner@server.com');
insert into profiles values ('e63533e4-eb10-48f6-ae69-ddf5f9242510', 'user@server.com');
create index on folder_visibility (profile_id);
create index on folder_visibility (folder_id);
create index on folders (folder_id);
create index on folders (id);
create index on profiles (id);
  1. 我获取给定配置文件的所有文件夹,这相对容易:

    select f2.id folder_id,
           p.id profile_id,
           p.email
      from folders f1
      join folders f2 on f2.path <@ f1.path
      join folder_visibility fv1 on fv1.folder_id = f1.id
      join profiles p on p.id = fv1.profile_id
     where fv1.profile_id = '7cbb5f24-61d0-430e-ab10-225554b0725b';
                  folder_id               |              profile_id              |      email
    --------------------------------------+--------------------------------------+------------------
     9ca5ed28-83c3-45c3-98a1-31ebf0466e63 | 7cbb5f24-61d0-430e-ab10-225554b0725b | owner@server.com
     80ba97ae-b21b-49d7-aedb-3004cdee0c21 | 7cbb5f24-61d0-430e-ab10-225554b0725b | owner@server.com
     a7abb686-2a69-44d1-b2cd-60825478c4c8 | 7cbb5f24-61d0-430e-ab10-225554b0725b | owner@server.com
    (3 rows)
    
  2. 从这里,我从上面完成规则 1,如下所示:

    select f2.id folder_id,
           p.id profile_id,
           p.email
      from folders f1
      join folders f2 on f2.path <@ f1.path
      join folder_visibility fv1 on fv1.folder_id = f1.id
      join folder_visibility fv2 on (fv2.folder_id = f1.id or fv2.folder_id = f2.id)
      join profiles p on p.id = fv2.profile_id
     where fv1.profile_id = '7cbb5f24-61d0-430e-ab10-225554b0725b';
                  folder_id               |              profile_id              |      email
    --------------------------------------+--------------------------------------+------------------
     9ca5ed28-83c3-45c3-98a1-31ebf0466e63 | 7cbb5f24-61d0-430e-ab10-225554b0725b | owner@server.com
     80ba97ae-b21b-49d7-aedb-3004cdee0c21 | 7cbb5f24-61d0-430e-ab10-225554b0725b | owner@server.com
     a7abb686-2a69-44d1-b2cd-60825478c4c8 | 7cbb5f24-61d0-430e-ab10-225554b0725b | owner@server.com
     80ba97ae-b21b-49d7-aedb-3004cdee0c21 | e63533e4-eb10-48f6-ae69-ddf5f9242510 | user@server.com
    (4 rows)
    
  3. 为了实现规则 2,我天真地添加了另一组连接,最后得到了必要行的列表:

    select distinct on (f3.id, p.email)
           f3.id folder_id,
           p.email,
           fv3.profile_id
      from folders f1
      join folders f2 on f2.path operator(public.<@) f1.path
      join folder_visibility fv1 on fv1.folder_id = f1.id
      join folder_visibility fv2 on (fv2.folder_id = f1.id or fv2.folder_id = f2.id)
      join folders f3 on f3.path operator(public.<@) f2.path
      join folder_visibility fv3 on (fv3.folder_id = f2.id or fv3.folder_id = f3.id)
      join profiles p on p.id = fv3.profile_id
     where fv1.profile_id = '7cbb5f24-61d0-430e-ab10-225554b0725b';
                  folder_id               |      email       |              profile_id
    --------------------------------------+------------------+--------------------------------------
     80ba97ae-b21b-49d7-aedb-3004cdee0c21 | owner@server.com | 7cbb5f24-61d0-430e-ab10-225554b0725b
     80ba97ae-b21b-49d7-aedb-3004cdee0c21 | user@server.com  | e63533e4-eb10-48f6-ae69-ddf5f9242510
     9ca5ed28-83c3-45c3-98a1-31ebf0466e63 | owner@server.com | 7cbb5f24-61d0-430e-ab10-225554b0725b
     a7abb686-2a69-44d1-b2cd-60825478c4c8 | owner@server.com | 7cbb5f24-61d0-430e-ab10-225554b0725b
     a7abb686-2a69-44d1-b2cd-60825478c4c8 | user@server.com  | e63533e4-eb10-48f6-ae69-ddf5f9242510
    (5 rows)
    

我的示例中最后一步的问题是执行计划变得非常庞大:

                                                                           QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=7.38..7.39 rows=1 width=64) (actual time=0.103..0.111 rows=5 loops=1)
   ->  Sort  (cost=7.38..7.39 rows=1 width=64) (actual time=0.103..0.106 rows=8 loops=1)
         Sort Key: f3.id, p.email
         Sort Method: quicksort  Memory: 25kB
         ->  Nested Loop  (cost=0.00..7.37 rows=1 width=64) (actual time=0.031..0.087 rows=8 loops=1)
               Join Filter: (fv3.profile_id = p.id)
               Rows Removed by Join Filter: 8
               ->  Nested Loop  (cost=0.00..6.33 rows=1 width=32) (actual time=0.028..0.066 rows=8 loops=1)
                     Join Filter: ((fv3.folder_id = f2.id) OR (fv3.folder_id = f3.id))
                     Rows Removed by Join Filter: 8
                     ->  Nested Loop  (cost=0.00..5.28 rows=1 width=32) (actual time=0.025..0.046 rows=8 loops=1)
                           Join Filter: (f3.path OPERATOR(public.<@) f2.path)
                           Rows Removed by Join Filter: 4
                           ->  Nested Loop  (cost=0.00..4.21 rows=1 width=48) (actual time=0.022..0.032 rows=4 loops=1)
                                 Join Filter: ((fv2.folder_id = f1.id) OR (fv2.folder_id = f2.id))
                                 Rows Removed by Join Filter: 2
                                 ->  Nested Loop  (cost=0.00..3.16 rows=1 width=64) (actual time=0.018..0.022 rows=3 loops=1)
                                       Join Filter: (f2.path OPERATOR(public.<@) f1.path)
                                       ->  Nested Loop  (cost=0.00..2.09 rows=1 width=48) (actual time=0.014..0.016 rows=1 loops=1)
                                             Join Filter: (f1.id = fv1.folder_id)
                                             Rows Removed by Join Filter: 2
                                             ->  Seq Scan on folder_visibility fv1  (cost=0.00..1.02 rows=1 width=16) (actual time=0.009..0.010 rows=1 loops=1)
                                                   Filter: (profile_id = '7cbb5f24-61d0-430e-ab10-225554b0725b'::uuid)
                                                   Rows Removed by Filter: 1
                                             ->  Seq Scan on folders f1  (cost=0.00..1.03 rows=3 width=48) (actual time=0.002..0.003 rows=3 loops=1)
                                       ->  Seq Scan on folders f2  (cost=0.00..1.03 rows=3 width=48) (actual time=0.002..0.002 rows=3 loops=1)
                                 ->  Seq Scan on folder_visibility fv2  (cost=0.00..1.02 rows=2 width=16) (actual time=0.001..0.001 rows=2 loops=3)
                           ->  Seq Scan on folders f3  (cost=0.00..1.03 rows=3 width=48) (actual time=0.001..0.001 rows=3 loops=4)
                     ->  Seq Scan on folder_visibility fv3  (cost=0.00..1.02 rows=2 width=32) (actual time=0.001..0.001 rows=2 loops=8)
               ->  Seq Scan on profiles p  (cost=0.00..1.02 rows=2 width=48) (actual time=0.001..0.001 rows=2 loops=8)
 Planning Time: 0.588 ms
 Execution Time: 0.151 ms
(32 rows)

在大型数据集(给定配置文件的数千个文件夹)上,此查询会执行几秒钟。根据上述两条规则,获取所有文件夹以及每个文件夹的所有“可见性”的更有效方法是什么?

标签: postgresqloptimizationquery-optimizationltree

解决方案


推荐阅读