首页 > 解决方案 > ActiveRecord OR 运算符将查询速度减慢 10 倍。为什么?

问题描述

我有一个 ActiveRecord 查询,它使用 OR 运算符将 2 个查询链接在一起。结果恢复正常,但执行组合查询的速度大约是单独执行两个查询中的任何一个的 10 倍。

我们有一个Event模型和一个Invitation模型。User可以Event通过邀请过滤器成为目标,或者通过Invitation记录单独邀请A来邀请 A。

因此,在确定有多少用户被邀请参加特定活动时,我们必须查看所有符合过滤条件Invitations和匹配过滤条件的用户。我们在这里这样做:

@invited_count = @invited_by_individual.or(@invited_by_filter).distinct.count(:id)

重要的是要注意,两者@invited_by_individual@invited_by_filter关系在其中都有referencesincludes陈述。

现在,问题是当我们执行该查询时,大约需要 1200 毫秒。如果我们单独进行查询,每个查询只需要大约 80 毫秒。因此@invited_by_filter.distinct.count@invited_by_individual.distinct.count两者都在大约 80 毫秒内返回结果,但这些都不是单独完成的。

有什么方法可以加快 OR 运算符的查询速度?为什么会发生这种情况?

这是 ActiveRecord 查询生成的 SQL:

快速、单一的查询:

(79.7ms)  
SELECT COUNT(DISTINCT "users"."id") 
FROM "users" 
LEFT OUTER JOIN "invitations" 
ON "invitations"."user_id" = "users"."id" 
WHERE "invitations"."event_id" = $1  [["event_id", 732]]

慢,结合查询:

(1220.7ms)  
SELECT COUNT(DISTINCT "users"."id") 
FROM "users" 
LEFT OUTER JOIN "invitations" 
ON "invitations"."user_id" = "users"."id" 
WHERE ("invitations"."event_id" = $1 OR "users"."organization_id" = $2)  [["event_id", 732], ["organization_id", 13]]

更新,这里是解释:

(1418.2ms)  SELECT COUNT(DISTINCT "users"."id") FROM "users" LEFT OUTER JOIN "invitations" ON "invitations"."user_id" = "users"."id" WHERE ("users"."root_organization_id" = $1 OR "invitations"."event_id" = $2)  [["root_organization_id", -1], ["event_id", 749]]
 => 
EXPLAIN for: SELECT COUNT(DISTINCT "users"."id") FROM "users" LEFT OUTER JOIN "invitations" ON "invitations"."user_id" = "users"."id" WHERE ("users"."root_organization_id" = $1 OR "invitations"."event_id" = $2) [["root_organization_id", -1], ["event_id", 749]]

 #=> QUERY PLAN
                                                     
 Aggregate  (cost=121781.56..121781.57 rows=1 width=8)
   ->  Hash Right Join  (cost=113248.88..121778.64 rows=1165 width=8)
         Hash Cond: (invitations.user_id = users.id)
         Filter: ((users.root_organization_id = '-1'::integer) OR (invitations.event_id = 749))
         ->  Seq Scan on invitations  (cost=0.00..1299.70 rows=63470 width=8)
         ->  Hash  (cost=93513.28..93513.28 rows=1135328 width=12)
               ->  Seq Scan on users  (cost=0.00..93513.28 rows=1135328 width=12)
(7 rows)

更新 2,解释单独运行的查询,确实使用索引:

(91.5ms)  SELECT COUNT(*) FROM "users" INNER JOIN "invitations" ON "invitations"."user_id" = "users"."id" WHERE "users"."root_organization_id" = $1  [["root_organization_id", -1]]
 => 
EXPLAIN for: SELECT COUNT(*) FROM "users" INNER JOIN "invitations" ON "invitations"."user_id" = "users"."id" WHERE "users"."root_organization_id" = $1 [["root_organization_id", -1]]

 #=> QUERY PLAN

 Aggregate  (cost=19.05..19.06 rows=1 width=8)
   ->  Nested Loop  (cost=0.72..19.05 rows=1 width=0)
         ->  Index Scan using index_users_on_root_organization_id on users  (cost=0.43..4.45 rows=1 width=8)
               Index Cond: (root_organization_id = '-1'::integer)
         ->  Index Only Scan using index_invitations_on_user_id on invitations  (cost=0.29..14.57 rows=3 width=4)
               Index Cond: (user_id = users.id)
(6 rows)

EXPLAIN for: SELECT COUNT(DISTINCT "users"."id") FROM "users" LEFT OUTER JOIN "invitations" ON "invitations"."user_id" = "users"."id" WHERE "invitations"."event_id" = $1 [["event_id", 749]]

 #=> QUERY PLAN

 Aggregate  (cost=536.34..536.35 rows=1 width=8)
   ->  Nested Loop  (cost=0.72..536.19 rows=62 width=8)
         ->  Index Scan using index_invitations_on_event_id on invitations  (cost=0.29..11.98 rows=62 width=4)
               Index Cond: (event_id = 749)
         ->  Index Only Scan using users_pkey on users  (cost=0.43..8.45 rows=1 width=8)
               Index Cond: (id = invitations.user_id)
(6 rows)

标签: sqlruby-on-railsdatabasepostgresqlactiverecord

解决方案


这是您使用的查询OR

SELECT COUNT(DISTINCT "users"."id") 
FROM "users" 
LEFT OUTER JOIN "invitations" 
ON "invitations"."user_id" = "users"."id" 
WHERE ("invitations"."event_id" = $1 OR "users"."organization_id" = $2)  

如果您在 Postgres 中尝试以下查询,我希望它会产生相同的结果,但工作速度更快:

SELECT
    COUNT(DISTINCT id) AS cc
FROM
    (
        SELECT
            "invitations"."user_id" AS id
        FROM
            "invitations"
        WHERE
            ("invitations"."event_id" = $1)

        UNION ALL

        SELECT
            "users"."id"
        FROM
            "users" 
        WHERE
            ("users"."organization_id" = $2)
    ) AS T
;

如果你有 on"invitations"."event_id"和 on的索引"users"."organization_id",引擎应该使用它们。如果您没有此类索引,请创建它们。

查询OR很慢,因为优化器不够聪明,无法执行此转换并将原始查询分成两部分。当您单独运行每个部分时,引擎会看到它可以使用适当的索引。当查询连接两个表并OR在过滤器中有条件时,WHERE没有单个索引可以返回所需的行,因此引擎不会尝试使用任何索引。它从users表中读取所有 1135328 行,并从表中读取所有 63470 行invitations。自然,它很慢。

我不知道如何将此查询转换为 ActiveRecord 语法。


推荐阅读