首页 > 解决方案 > PostgreSQL: Drop in performance with multiple running functions without hardware bottleneck

问题描述

I have simple function, if I run it it takes around 40 second to finish.

select * from f_cyklus1(100000000) 

but if I run this function 8 times in 8 separated instances, meaning all 8 function are running in parallel it takes around 210 to 260 seconds to finish for each of it's instances. Which is a massive drop in performance. I tried to compile it as 8 individual functions and run it again but it had no change in performance.

select * from f_cyklus1(100000000); 
select * from f_cyklus2(100000000); 
select * from f_cyklus3(100000000); 
select * from f_cyklus4(100000000); 
select * from f_cyklus5(100000000); 
select * from f_cyklus6(100000000); 
select * from f_cyklus7(100000000); 
select * from f_cyklus8(100000000); 

So why it takes 40s compare to 210-260s to finish? Our virtual machine has 16 CPUs and physical hardware was at low usage. I was also the only one using the Postgre database at the time of testing.

create or replace function f_cyklus1 (p_rozsah int) returns bigint as -- drop function f_cyklus(int)
$body$
declare 

declare 
  v_exc_context        TEXT;
  v_result             INTEGER;

  p_soucet bigint :=0;
begin

for i in 0..p_rozsah
loop
p_soucet = p_soucet + i;
end loop;

return p_soucet;

EXCEPTION
  WHEN OTHERS THEN
    GET STACKED DIAGNOSTICS v_exc_context = PG_EXCEPTION_CONTEXT;
    PERFORM main.ut_log('ERR', SQLERRM || ' [SQL State: ' || SQLSTATE || '] Context: ' || v_exc_context );
    RAISE;
END;
$body$ LANGUAGE plpgsql

PostgreSQL 11.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit

Virtual machine: Centos 7 + KVM

HW: 2x AMD EPYC 7351 + 256 GB RAM

Note: I already asked similar question where I thought it was due to asynchronous processing, but this shows the problem is actually in raw Postgres performance, therefore I deleted my former question and asked this new one.

标签: postgresqlperformancefunction

解决方案


p_soucet = p_soucet + i;

每次执行此操作时,它都必须获取运行语句的“快照”,因为它在后台使用常规 SQL 引擎,并且始终需要在快照中运行。获取快照需要系统范围的锁定。您同时运行的进程越多,它们花费更多时间来获取快照,而不是做有用的工作。

如果您在设置为“可重复读取”的事务中运行该函数,您会发现它们的扩展性更好,因为它们在持续时间内保持相同的快照并继续重复使用它。当然,这可能会干扰您的实际用例。

plpgsql 并不是很适合这种工作,暂时搁置。您可以使用其他 pl 语言之一,例如 plperl 或 plpythonu。

在https://www.postgresql.org/docs/current/plpgsql-expressions.html中描述了主 SQL 引擎如何评估表达式

从https://www.postgresql.org/docs/current/mvcc.html开始的文档中一般会讨论快照

我不知道最终用户在任何地方都记录了两者之间的交互。


推荐阅读