postgresql - PostgreSQL/PostDock:主节点中的自动恢复失败
问题描述
我使用 Docker 服务和 Docker swarm 来部署 PostDock 集群
这是我的 docker-compose.yml 设置:
version: "3.3"
networks:
postdock:
external: true
services:
pgmaster:
image: postdock/postgres
environment:
PARTNER_NODES: "pgmaster,pgslave1"
CLUSTER_NODE_NETWORK_NAME: pgmaster
NODE_PRIORITY: 100
NODE_ID: 1
NODE_NAME: pgmaster
POSTGRES_PASSWORD: 123
POSTGRES_USER: postgres
POSTGRES_DB: postgres
CONFIGS: "listen_addresses:'*'"
CLUSTER_NAME: pg_cluster
REPLICATION_DB: replication_db
REPLICATION_USER: replication_user
REPLICATION_PASSWORD: replication_pass
ports:
- 4000:5432
volumes:
- /data/master_slave:/var/lib/postgresql/data
networks:
- postdock
deploy:
placement:
constraints:
- node.role == manager
- node.hostname == 192.168.1.161
pgslave1:
image: postdock/postgres
environment:
PARTNER_NODES: "pgmaster,pgslave1"
REPLICATION_PRIMARY_HOST: pgmaster
NODE_ID: 2
NODE_NAME: pgslave1
CLUSTER_NODE_NETWORK_NAME: pgslave1
REPLICATION_PRIMARY_PORT: 5432
CONFIGS: "max_replication_slots:10"
ports:
- 4001:5432
volumes:
- /data/slave_1:/var/lib/postgresql/data
networks:
- postdock
deploy:
placement:
constraints:
- node.role == manager
- node.hostname == 192.168.1.161
pgslave2:
image: postdock/postgres
environment:
PARTNER_NODES: "pgmaster,pgslave1,pgslave2"
REPLICATION_PRIMARY_HOST: pgmaster
NODE_ID: 3
NODE_NAME: pgslave2
CLUSTER_NODE_NETWORK_NAME: pgslave2
REPLICATION_PRIMARY_PORT: 5432
CONFIGS: "max_replication_slots:10"
ports:
- 4002:5432
volumes:
- /data/slave_2:/var/lib/postgresql/data
networks:
- postdock
deploy:
placement:
constraints:
- node.role == manager
- node.hostname == 192.168.1.161
db:
image: postdock/pgpool
environment:
PCP_USER: pcp_user
PCP_PASSWORD: pcp_pass
WAIT_BACKEND_TIMEOUT: 60
CHECK_USER: postgres
CHECK_PASSWORD: 123
CHECK_PGCONNECT_TIMEOUT: 3
DB_USERS: postgres:123
BACKENDS: "0:pgmaster:5432:1:/var/lib/postgresql/data:ALLOW_TO_FAILOVER,1:pgslave1::::,2:pgslave2::::,"
REQUIRE_MIN_BACKENDS: 1
CONFIGS: "num_init_children:250,max_pool:4"
ports:
- 4003:5432
- 9899:9898
networks:
- postdock
deploy:
placement:
constraints:
- node.role == manager
- node.hostname == 192.168.1.161
我跑:
docker network create -d overlay postdock
docker stack deploy -c docker-compose.yml postdock
一切顺利。
但是,在我多次更新服务后,主节点上的自动故障转移失败了。在主节点日志文件中,我注意到恢复过程无法检测到数据库 replication_db 和架构 replication_db.public:
>>> Waiting for local postgres server start...,
expr: non-integer argument,
>>> Wait schema . on pgmaster:5432(user: public,password: *******), will try times with delay 10 seconds (TIMEOUT=)
如您所见,没有指定模式,只有点号“。” ,并且用户也错了:应该是replication_user,而不是用户public
这导致此错误消息:
2018-11-16 04:45:33.310 UTC [122] FATAL: password authentication failed for user "public",
2018-11-16 04:45:33.310 UTC [122] DETAIL: Role "public" does not exist.,
Connection matched pg_hba.conf line 95: "host all all all md5",
psql: FATAL: password authentication failed for user "public",
2018-11-16 04:45:37.974 UTC [125] FATAL: no PostgreSQL user name specified in startup packet,
2018-11-16 04:45:39.345 UTC [127] FATAL: no PostgreSQL user name specified in startup packet,
2018-11-16 04:45:40.374 UTC [128] FATAL: no PostgreSQL user name specified in startup packet,
2018-11-16 04:45:41.386 UTC [129] FATAL: no PostgreSQL user name specified in startup packet,
2018-11-16 04:45:42.421 UTC [130] FATAL: no PostgreSQL user name specified in startup packet,
>>>>>> Host pgmaster:5432 is not accessible (will try times more),
expr: non-integer argument,
据我了解,当自动故障转移成功时,预期的恢复日志应该是:
>>> Waiting for local postgres server start...,
>>> Wait schema replication_db.public on pgmaster:5432(user: replication_user,password: *******), will try 9 times with delay 10 seconds (TIMEOUT=90),
>>>>>> Schema replication_db.public exists on host pgmaster:5432!,
>>> Registering node with role master
有人知道这个问题的根本原因吗?
解决方案
推荐阅读
- python - 将列表元素从 String 转换为 int
- c - 在 C 中使用 fwscanf() 从 Shift-JIS 文本读取失败
- python - 如何使用 Python 中的 CSV 模块删除单行?
- macos - 如何将 express js 与 phpmyadmin (xampp vm os x) 连接?
- jquery - 排除周末 Jquery 循环
- javascript - 如何在每天18:00截屏并以文件名中的递增数字或不覆盖文件的方式保存到硬盘?
- c++ - 要包含在代码中的文件的完整路径名
- android - 在颤动中将图标添加到 DropDownButton(扩展)的左侧
- react-native - 如何在 React-native 中正确设置 axios 超时
- node.js - 将大量文档写入 Firestore 的最快方法是什么?