首页 > 解决方案 > Why would long running select queries cause a replication lag in MySQL slave database?

问题描述

we have a MySQL with a replica (5.7 with row based replication).

Now, the master performs at peak about 3000 inserts per second, and the replica seems to read that just fine. However, sometimes we execute long-time select queries (that ran from 10 to 20 seconds). And during those queries the replication lag becomes very huge.

What I do not understand is how the usual mysql threads that execute selects (without locking any tables) can cause the replication thread to slow down (i.e. it performs about 2.5K inserts instead of 3K like master)? What would I need to tune exactly?

Now I checked the slave status and it's not about the IO thread - this one manages to read events from the master just fine. It's SQL slave thread, that somehow does not manage to catch up. The isolation level is Read Committed, so the select queries potentially could lock some records and make the slave thread wait. But I'm not sure about that.

UPDATED. I have checked again - it turns out that even a single heavy query (that scans the entire table for example) on the slave produces the lag. It seems like slave sql thread is blocked, but I do not understand why?

UPDATED 2. I finally found the solution. First I increased number of slave_parallel_workers to 4 and set slave_parallel_type to LOGICAL_CLOCK. However, and this is important, that gave me no improvement at all, since the transactions were dependent. But, after I increased on master binlog_group_commit_sync_delay to 10000 (that is, 10 milliseconds), the lag disappeared.

标签: mysql

解决方案


mysql 从数据库中复制滞后的原因可能有很多。但正如你提到的

它是 SQL 从属线程,不知何故无法赶上。

假设 IO 工作正常,Percona说(强调我的):

[...] 当从属 SQL_THREAD 是复制延迟的来源时,可能是因为来自复制流的查询在从属上执行的时间太长。这有时是因为主/从之间的不同硬件、不同的模式索引、工作负载。此外,从属 OLTP 工作负载有时会因为锁定而导致复制延迟。
例如,如果对 MyISAM 表的长时间运行读取会阻塞 SQL 线程,或者对 InnoDB 表的任何事务都会创建 IX 锁并阻塞 SQL 线程中的 DDL。另外,考虑到在 MySQL 5.6 之前从属是单线程的,这将是从属 SQL_THREAD 延迟的另一个原因


推荐阅读