database - PostgreSQL 逻辑复制应该使用多少 CPU/带宽?
问题描述
我按照此处的说明使用 PostgreSQL 11 设置逻辑复制:https ://www.digitalocean.com/community/tutorials/how-to-set-up-logical-replication-with-postgresql-10-on-ubuntu-18 -04
一切都很好,在测试后,更改被复制了。
然而,一个月后……变化似乎没有被复制,而且 Postgres 似乎使用了大量的 CPU 和带宽。
- 在 2vCPU/4GB DigitalOcean 服务器上,平均负载约为 2.5。
- 带宽约为 1MB/s。
- 目前此服务器和数据库上的活动基本上为零。
这提出了一些问题,例如:
- 对于具有逻辑流复制的非活动数据库使用如此多的资源,这是否正常?
- 关于为什么复制似乎已经停止的任何想法?(更改主服务器上的记录不再影响副本)
- 是否有一些用于监视和查看复制状态的专业提示?
Postgres 服务器的主日志充满了这些类型的消息:
2019-04-22 06:26:16.986 UTC [20371] replica_user@server_prod LOG: logical decoding found consistent point at 0/1EC21198
2019-04-22 06:26:16.986 UTC [20371] replica_user@server_prod DETAIL: There are no running transactions.
2019-04-22 06:26:17.010 UTC [20372] replica_user@server_prod LOG: logical decoding found consistent point at 0/1EC211D0
2019-04-22 06:26:17.010 UTC [20372] replica_user@server_prod DETAIL: There are no running transactions.
2019-04-22 06:26:17.055 UTC [20373] replica_user@server_prod LOG: logical decoding found consistent point at 0/1EC21208
2019-04-22 06:26:17.055 UTC [20373] replica_user@server_prod DETAIL: There are no running transactions.
2019-04-22 06:26:17.078 UTC [20374] replica_user@server_prod LOG: logical decoding found consistent point at 0/1EC21240
2019-04-22 06:26:17.078 UTC [20374] replica_user@server_prod DETAIL: There are no running transactions.
2019-04-22 06:26:17.114 UTC [20375] replica_user@server_prod LOG: logical decoding found consistent point at 0/1EC21278
2019-04-22 06:26:17.114 UTC [20375] replica_user@server_prod DETAIL: There are no running transactions.
2019-04-22 06:26:17.154 UTC [20376] replica_user@server_prod LOG: logical decoding found consistent point at 0/1EC212B0
2019-04-22 06:26:17.154 UTC [20376] replica_user@server_prod DETAIL: There are no running transactions.
2019-04-22 06:26:17.186 UTC [20377] replica_user@server_prod LOG: logical decoding found consistent point at 0/1EC212E8
2019-04-22 06:26:17.186 UTC [20377] replica_user@server_prod DETAIL: There are no running transactions.
2019-04-22 06:26:17.229 UTC [20378] replica_user@server_prod LOG: logical decoding found consistent point at 0/1EC21320
2019-04-22 06:26:17.229 UTC [20378] replica_user@server_prod DETAIL: There are no running transactions.
2019-04-22 06:26:17.235 UTC [20378] replica_user@server_prod LOG: could not send data to client: Connection reset by peer
2019-04-22 06:26:17.235 UTC [20378] replica_user@server_prod STATEMENT: COPY public.class_registrations TO STDOUT
2019-04-22 06:26:17.235 UTC [20378] replica_user@server_prod FATAL: connection to client lost
2019-04-22 06:26:17.235 UTC [20378] replica_user@server_prod STATEMENT: COPY public.class_registrations TO STDOUT
2019-04-22 06:26:17.259 UTC [20379] replica_user@server_prod LOG: logical decoding found consistent point at 0/1EC21358
2019-04-22 06:26:17.259 UTC [20379] replica_user@server_prod DETAIL: There are no running transactions.
2019-04-22 06:26:21.327 UTC [20418] replica_user@server_prod LOG: logical decoding found consistent point at 0/1EC21390
2019-04-22 06:26:21.327 UTC [20418] replica_user@server_prod DETAIL: There are no running transactions.
2019-04-22 06:26:21.341 UTC [20419] replica_user@server_prod LOG: logical decoding found consistent point at 0/1EC213C8
2019-04-22 06:26:21.341 UTC [20419] replica_user@server_prod DETAIL: There are no running transactions.
副本服务器充满了这些类型的消息:
2019-04-21 06:26:07.619 UTC [2967] LOG: logical replication table synchronization worker for subscription "replica_subscription", table "messages" has started
2019-04-21 06:26:07.645 UTC [2966] ERROR: duplicate key value violates unique constraint "account_locations_pkey"
2019-04-21 06:26:07.645 UTC [2966] DETAIL: Key (id)=(1) already exists.
2019-04-21 06:26:07.645 UTC [2966] CONTEXT: COPY account_locations, line 1
2019-04-21 06:26:07.648 UTC [16353] LOG: background worker "logical replication worker" (PID 2966) exited with exit code 1
2019-04-21 06:26:07.652 UTC [2968] LOG: logical replication table synchronization worker for subscription "replica_subscription", table "user_photos" has started
2019-04-21 06:26:07.663 UTC [2967] ERROR: duplicate key value violates unique constraint "messages_pkey"
2019-04-21 06:26:07.663 UTC [2967] DETAIL: Key (id)=(1) already exists.
2019-04-21 06:26:07.663 UTC [2967] CONTEXT: COPY messages, line 1
这是过去 6 小时内的平均负载(您可以看到我何时在副本服务器上删除了订阅者)。
这是带宽:
这也是iftop
仅约 10-15 秒监控的结果:
解决方案
根据 Laurenz 的建议查看日志后,我最初加载的数据似乎没有对所有表都正确的主 ID 序列。(不知道这是怎么发生的)
为了解决复制问题,我执行了以下操作:
- 从副本服务器中删除了订阅
- 删除所有表
- 重新加载所有表 - 仅模式(无数据)
- 再次创建订阅
这导致所有数据同步,一切都恢复正常。我通过更新数据并在副本服务器中看到它更新来确认。
高 CPU 负载和带宽似乎是在出现复制错误时,Postgres 只是尽可能地一遍又一遍地尝试。
推荐阅读
- winapi - 访问比 LongPointer 更多的窗口额外字节
- c# - 使用 C# 以编程方式刷新 Power Apps 门户中的实体的数据
- angular - 如何在角度可观察订阅中对代码进行单元测试
- .htaccess - Symfony 5 EasyAdmin 2.x 'admin' 路由不工作
- c++ - 神经网络似乎无法正常工作
- scala - 我从 alpakka kafka 消费时收到了死信
- python - 带字节的串行通信(0x00 问题)
- python - 如何在模型或视图中请求用户登录
- c++ - C++ Gapful Numbers 崩溃
- php - 无论如何,他们是否要向使用 group by 返回特定列的最新行的查询添加连接