首页 > 技术文章 > MHA 常见问题解决

pythonx 2020-01-10 18:24 原文

 目录 

 

一、免密配置成功后

masterha_check_ssh --conf=/etc/masterha/app1.cnf

Wed Jan 8 18:00:57 2020 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Jan 8 18:00:57 2020 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Wed Jan 8 18:00:57 2020 - [info] Updating application default configuration from /etc/masterha/pm/load_cnf..
Can't exec "/etc/masterha/pm/load_cnf": No such file or directory at /usr/share/perl5/vendor_perl/MHA/Config.pm line 365.
Wed Jan 8 18:00:57 2020 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Wed Jan 8 18:00:57 2020 - [info] Starting SSH connection tests..
Wed Jan 8 18:00:58 2020 - [debug] 
Wed Jan 8 18:00:57 2020 - [debug] Connecting via SSH from root@192.168.1.147(192.168.1.147:22) to root@192.168.1.58(192.168.1.58:22)..
Wed Jan 8 18:00:57 2020 - [debug] ok.
Wed Jan 8 18:00:58 2020 - [debug] 
Wed Jan 8 18:00:58 2020 - [debug] Connecting via SSH from root@192.168.1.58(192.168.1.58:22) to root@192.168.1.147(192.168.1.147:22)..
Wed Jan 8 18:00:58 2020 - [debug] ok.
Wed Jan 8 18:00:58 2020 - [info] All SSH connection tests passed successfully.


二、监控检查问题

[root@dataexa-ccb-test-58 masterha]# masterha_check_repl --conf=/etc/masterha/app1.cnf 
Thu Jan 9 10:36:19 2020 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu Jan 9 10:36:19 2020 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Thu Jan 9 10:36:19 2020 - [info] Updating application default configuration from /etc/masterha/pm/load_cnf..
Thu Jan 9 10:36:19 2020 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Thu Jan 9 10:36:19 2020 - [info] MHA::MasterMonitor version 0.58.
Thu Jan 9 10:36:20 2020 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln188] There is no alive server. We can't do failover
Thu Jan 9 10:36:20 2020 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations. at /usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm line 329.
Thu Jan 9 10:36:20 2020 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.
Thu Jan 9 10:36:20 2020 - [info] Got exit code 1 (Not master dead).

MySQL Replication Health is NOT OK!

原因是app1.cnf 没有配置用户和密码 端口

vi app1.cnf
user=manager
password=123456
port=3306
#系统ssh用户
ssh_user=root
ssh_port=22

#复制用户
repl_user=salve
repl_password=123456
port=3306


还有一种就是检查问题是:
[root@dataexa-ccb-test-58 masterha]# masterha_check_repl --conf=/etc/masterha/app1.cnf 
Thu Jan 9 10:54:03 2020 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu Jan 9 10:54:03 2020 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Thu Jan 9 10:54:03 2020 - [info] Updating application default configuration from /etc/masterha/pm/load_cnf..
Thu Jan 9 10:54:03 2020 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Thu Jan 9 10:54:03 2020 - [info] MHA::MasterMonitor version 0.58.
Thu Jan 9 10:54:04 2020 - [error][/usr/share/perl5/vendor_perl/MHA/Server.pm, ln180] Got MySQL error when connecting 192.168.1.58(192.168.1.58:31061) :1045:Access denied for user 'root'@'master' (using password: YES), but this is not a MySQL crash. Check MySQL server settings.
Thu Jan 9 10:54:04 2020 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln301] at /usr/share/perl5/vendor_perl/MHA/ServerManager.pm line 297.
Thu Jan 9 10:54:04 2020 - [error][/usr/share/perl5/vendor_perl/MHA/Server.pm, ln180] Got MySQL error when connecting 192.168.1.147(192.168.1.147:31061) :1045:Access denied for user 'root'@'master' (using password: YES), but this is not a MySQL crash. Check MySQL server settings.
Thu Jan 9 10:54:04 2020 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln301] at /usr/share/perl5/vendor_perl/MHA/ServerManager.pm line 297.
Thu Jan 9 10:54:05 2020 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln309] Got fatal error, stopping operations
Thu Jan 9 10:54:05 2020 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations. at /usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm line 329.
Thu Jan 9 10:54:05 2020 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.
Thu Jan 9 10:54:05 2020 - [info] Got exit code 1 (Not master dead).

MySQL Replication Health is NOT OK!

原因是账号密码错误,重新配置mysql 账号密码信息


问题三、
[root@dataexa-ccb-test-58 masterha]# masterha_check_repl --conf=/etc/masterha/app1.cnf 
Thu Jan 9 11:06:25 2020 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu Jan 9 11:06:25 2020 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Thu Jan 9 11:06:25 2020 - [info] Updating application default configuration from /etc/masterha/pm/load_cnf..
Thu Jan 9 11:06:25 2020 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Thu Jan 9 11:06:25 2020 - [info] MHA::MasterMonitor version 0.58.
Thu Jan 9 11:06:26 2020 - [info] GTID failover mode = 1
Thu Jan 9 11:06:26 2020 - [info] Dead Servers:
Thu Jan 9 11:06:26 2020 - [info] Alive Servers:
Thu Jan 9 11:06:26 2020 - [info] 192.168.1.147(192.168.1.147:31061)
Thu Jan 9 11:06:26 2020 - [info] 192.168.1.58(192.168.1.58:31061)
Thu Jan 9 11:06:26 2020 - [info] Alive Slaves:
Thu Jan 9 11:06:26 2020 - [info] 192.168.1.58(192.168.1.58:31061) Version=5.7.26-log (oldest major version between slaves) log-bin:enabled
Thu Jan 9 11:06:26 2020 - [info] GTID ON
Thu Jan 9 11:06:26 2020 - [info] Replicating from 192.168.1.147(192.168.1.147:31061)
Thu Jan 9 11:06:26 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Thu Jan 9 11:06:26 2020 - [info] Current Alive Master: 192.168.1.147(192.168.1.147:31061)
Thu Jan 9 11:06:26 2020 - [info] Checking slave configurations..
Thu Jan 9 11:06:26 2020 - [info] Checking replication filtering settings..
Thu Jan 9 11:06:26 2020 - [info] binlog_do_db= , binlog_ignore_db= 
Thu Jan 9 11:06:26 2020 - [info] Replication filtering check ok.
Thu Jan 9 11:06:26 2020 - [error][/usr/share/perl5/vendor_perl/MHA/Server.pm, ln398] 192.168.1.58(192.168.1.58:31061): User salve does not exist or does not have REPLICATION SLAVE privilege! Other slaves can not start replication from this host.
Thu Jan 9 11:06:26 2020 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations. at /usr/share/perl5/vendor_perl/MHA/ServerManager.pm line 1403.
Thu Jan 9 11:06:26 2020 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.
Thu Jan 9 11:06:26 2020 - [info] Got exit code 1 (Not master dead).

MySQL Replication Health is NOT OK!

SELECT host,user,authentication_string,Grant_priv,Super_priv ,Repl_slave_priv AS Value FROM mysql.user;

查询 Value 为N,改为Y 就行

FLUSH PRIVILEGES;

问题四、
[root@dataexa-ccb-test-58 masterha]# masterha_check_repl --conf=/etc/masterha/app1.cnf 
Thu Jan 9 12:03:07 2020 - [info] Checking master_ip_failover_script status:
Thu Jan 9 12:03:07 2020 - [info] /etc/masterha/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.1.147 --orig_master_ip=192.168.1.147 --orig_master_port=31061 
Gateway: 0.0.0.0 can't reached!!!Thu Jan 9 12:03:09 2020 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln229] Failed to get master_ip_failover_script status with return code 10:0.
Thu Jan 9 12:03:09 2020 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations. at /usr/bin/masterha_check_repl line 48.
Thu Jan 9 12:03:09 2020 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.
Thu Jan 9 12:03:09 2020 - [info] Got exit code 1 (Not master dead).

MySQL Replication Health is NOT OK!
原因是: 网关写 0.0.0.0 改为自己的网关

 

三、主从不同步问题

导致主从不同步,有可能是数据库问题不一样,一定要确认数据库一致。

有可能自己删除bin-log问题
解决1、重新初始化
解决2、停止 slave 重新同步
master 操作
flush logs;

show master status\G;

在从操作
stop slave;change master to master_host='192.168.1.147',master_port=31061, master_user='slave', master_password='Aipf@123456',master_log_file='master-bin.0000004', master_log_pos=346;

start slave;

show slave status\G;
解决3、
stop slave;
reset slave;
start slave;

 

主从不同步,因为 数据不一致
在主库 锁表,备份整个库,导入到从库里面
flush table with read lock; 
从库
stop slave;
source sqlfile;
change master to master_auto_position=0;
reset slave;
start slave;


四、数据库备份

备份全库
mysqldump -uroot -pAipf@123 -A --master-data | gzip > ./all.sql.gz

 

备份
percona公司xtrabackup
percona-xtrabackup-24-2.4.11-1.el7.x86_64.rpm

推荐阅读