MHA在线切换过程-白红宇

MHA在线切换过程

阅读量：5952 次

发布时间：2019-06-19

本文共 17967 字，大约阅读时间需要 59 分钟。

　　MHA 在线切换是MHA除了自动监控切换换提供的另外一种方式，多用于诸如硬件升级，MySQL数据库迁移等等。该方式提供快速切换和优雅的阻塞写入，无关关闭原有服务器，整个切换过程在0.5-2s 的时间左右，大大减少了停机时间。Online master switch开始只有当所有下列条件得到满足：

1. IO threads on all slaves are running   // 在所有slave上IO线程运行。 2. SQL threads on all slaves are running  //SQL线程在所有的slave上正常运行。 3. Seconds_Behind_Master on all slaves are less or equal than --running_updates_limit seconds  // 在所有的slaves上 Seconds_Behind_Master 要小于等于  running_updates_limit seconds 4. On master, none of update queries take more than --running_updates_limit seconds in the show processlist output  // 在主上，没有更新查询操作多于running_updates_limit seconds 在show processlist输出结果上。

这些限制的原因是出于安全原因,并尽快切换到新主库。

1.校验当前是否启用masterha_manager（建议停掉）

[root@DBproxy app2]# masterha_check_status --conf=/data/masterha/app1/app1.cnfapp1 (pid:6769) is running(0:PING_OK), master:192.168.0.50[root@DBproxy app2]#

2.校验slave的IO_threads、SQL_threads、Seconds_Behind_Master

[mysql@MyDB02 masterha]$ mysql -uroot -p123456 -h192.168.0.60 -e 'show slave status \G'|grep -E "Slave_IO_Running|Slave_SQL_Running|Seconds_Behind_Master"Warning: Using a password on the command line interface can be insecure.             Slave_IO_Running: Yes            Slave_SQL_Running: Yes        Seconds_Behind_Master: 0      Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it[mysql@MyDB02 masterha]$

3.实施在线切换

[root@DBproxy masterha]# masterha_master_switch --conf=/data/masterha/app1/app1.cnf --master_state=alive --new_master_host=192.168.0.60 --orig_master_is_new_slave --running_updates_limit=10000 --interactive=0Sat Jul 16 09:11:00 2016 - [info] MHA::MasterRotate version 0.56.Sat Jul 16 09:11:00 2016 - [info] Starting online master switch..Sat Jul 16 09:11:00 2016 - [info] Sat Jul 16 09:11:00 2016 - [info] * Phase 1: Configuration Check Phase..Sat Jul 16 09:11:00 2016 - [info] Sat Jul 16 09:11:00 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.Sat Jul 16 09:11:00 2016 - [info] Reading application default configuration from /data/masterha/app1/app1.cnf..Sat Jul 16 09:11:00 2016 - [info] Reading server configuration from /data/masterha/app1/app1.cnf..Sat Jul 16 09:11:00 2016 - [info] GTID failover mode = 0Sat Jul 16 09:11:00 2016 - [info] Current Alive Master: 192.168.0.50(192.168.0.50:3306)Sat Jul 16 09:11:00 2016 - [info] Alive Slaves:Sat Jul 16 09:11:00 2016 - [info]   192.168.0.60(192.168.0.60:3306)  Version=5.6.29-log (oldest major version between slaves) log-bin:enabledSat Jul 16 09:11:00 2016 - [info]     Replicating from 192.168.0.50(192.168.0.50:3306)Sat Jul 16 09:11:00 2016 - [info]     Primary candidate for the new Master (candidate_master is set)Sat Jul 16 09:11:00 2016 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..Sat Jul 16 09:11:00 2016 - [info]  ok.Sat Jul 16 09:11:00 2016 - [info] Checking MHA is not monitoring or doing failover..Sat Jul 16 09:11:00 2016 - [error][/usr/share/perl5/vendor_perl/MHA/MasterRotate.pm, ln142] Getting advisory lock failed on the current master. MHA Monitor runs on the current master. Stop MHA Manager/Monitor and try again.Sat Jul 16 09:11:00 2016 - [error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln177] Got ERROR:  at /usr/bin/masterha_master_switch line 53[root@DBproxy masterha]#将MHA停掉再进行测试[root@DBproxy masterha]# masterha_stop  --conf=/data/masterha/app1/app1.cnfStopped app1 successfully.[2]-  Exit 1                  nohup masterha_manager --conf=/data/masterha/app1/app1.cnf 2>&1  (wd: /data/masterha/app2)(wd now: /data/masterha)[root@DBproxy masterha]#

4.再次实施在线切换

[root@DBproxy masterha]# masterha_master_switch --conf=/data/masterha/app1/app1.cnf --master_state=alive --new_master_host=192.168.0.60 --orig_master_is_new_slave --running_updates_limit=10000 --interactive=0Sat Jul 16 09:15:03 2016 - [info] MHA::MasterRotate version 0.56.Sat Jul 16 09:15:03 2016 - [info] Starting online master switch..Sat Jul 16 09:15:03 2016 - [info] Sat Jul 16 09:15:03 2016 - [info] * Phase 1: Configuration Check Phase..Sat Jul 16 09:15:03 2016 - [info] Sat Jul 16 09:15:03 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.Sat Jul 16 09:15:03 2016 - [info] Reading application default configuration from /data/masterha/app1/app1.cnf..Sat Jul 16 09:15:03 2016 - [info] Reading server configuration from /data/masterha/app1/app1.cnf..Sat Jul 16 09:15:03 2016 - [info] GTID failover mode = 0Sat Jul 16 09:15:03 2016 - [info] Current Alive Master: 192.168.0.50(192.168.0.50:3306)Sat Jul 16 09:15:03 2016 - [info] Alive Slaves:Sat Jul 16 09:15:03 2016 - [info]   192.168.0.60(192.168.0.60:3306)  Version=5.6.29-log (oldest major version between slaves) log-bin:enabledSat Jul 16 09:15:03 2016 - [info]     Replicating from 192.168.0.50(192.168.0.50:3306)Sat Jul 16 09:15:03 2016 - [info]     Primary candidate for the new Master (candidate_master is set)Sat Jul 16 09:15:03 2016 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..Sat Jul 16 09:15:03 2016 - [info]  ok.Sat Jul 16 09:15:03 2016 - [info] Checking MHA is not monitoring or doing failover..Sat Jul 16 09:15:03 2016 - [info] Checking replication health on 192.168.0.60..Sat Jul 16 09:15:03 2016 - [info]  ok.Sat Jul 16 09:15:03 2016 - [info] 192.168.0.60 can be new master.Sat Jul 16 09:15:03 2016 - [info] From:192.168.0.50(192.168.0.50:3306) (current master) +--192.168.0.60(192.168.0.60:3306)To:192.168.0.60(192.168.0.60:3306) (new master) +--192.168.0.50(192.168.0.50:3306)Sat Jul 16 09:15:03 2016 - [info] Checking whether 192.168.0.60(192.168.0.60:3306) is ok for the new master..Sat Jul 16 09:15:03 2016 - [info]  ok.Sat Jul 16 09:15:03 2016 - [info] 192.168.0.50(192.168.0.50:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host.Sat Jul 16 09:15:03 2016 - [info] 192.168.0.50(192.168.0.50:3306): Resetting slave pointing to the dummy host.Sat Jul 16 09:15:03 2016 - [info] ** Phase 1: Configuration Check Phase completed.Sat Jul 16 09:15:03 2016 - [info] Sat Jul 16 09:15:03 2016 - [info] * Phase 2: Rejecting updates Phase..Sat Jul 16 09:15:03 2016 - [info] Sat Jul 16 09:15:03 2016 - [warning] master_ip_online_change_script is not defined. Skipping disabling writes on the current master.Sat Jul 16 09:15:03 2016 - [info] Locking all tables on the orig master to reject updates from everybody (including root):Sat Jul 16 09:15:03 2016 - [info] Executing FLUSH TABLES WITH READ LOCK..Sat Jul 16 09:15:03 2016 - [info]  ok.Sat Jul 16 09:15:03 2016 - [info] Orig master binlog:pos is mysql-bin.000009:40355591.Sat Jul 16 09:15:03 2016 - [info]  Waiting to execute all relay logs on 192.168.0.60(192.168.0.60:3306)..Sat Jul 16 09:15:03 2016 - [info]  master_pos_wait(mysql-bin.000009:40355591) completed on 192.168.0.60(192.168.0.60:3306). Executed 0 events.Sat Jul 16 09:15:03 2016 - [info]   done.Sat Jul 16 09:15:03 2016 - [info] Getting new master's binlog name and position..Sat Jul 16 09:15:03 2016 - [info]  mysql-bin.000006:120Sat Jul 16 09:15:03 2016 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.0.60', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000006', MASTER_LOG_POS=120, MASTER_USER='repl', MASTER_PASSWORD='xxx';Sat Jul 16 09:15:03 2016 - [info] Sat Jul 16 09:15:03 2016 - [info] * Switching slaves in parallel..Sat Jul 16 09:15:03 2016 - [info] Sat Jul 16 09:15:03 2016 - [info] Unlocking all tables on the orig master:Sat Jul 16 09:15:03 2016 - [info] Executing UNLOCK TABLES..Sat Jul 16 09:15:03 2016 - [info]  ok.Sat Jul 16 09:15:03 2016 - [info] Starting orig master as a new slave..Sat Jul 16 09:15:03 2016 - [info]  Resetting slave 192.168.0.50(192.168.0.50:3306) and starting replication from the new master 192.168.0.60(192.168.0.60:3306)..Sat Jul 16 09:15:03 2016 - [info]  Executed CHANGE MASTER.Sat Jul 16 09:15:14 2016 - [error][/usr/share/perl5/vendor_perl/MHA/Server.pm, ln784] Slave could not be started on 192.168.0.50(192.168.0.50:3306)! Check slave status.Sat Jul 16 09:15:14 2016 - [error][/usr/share/perl5/vendor_perl/MHA/Server.pm, ln862] Starting slave IO/SQL thread on 192.168.0.50(192.168.0.50:3306) failed!Sat Jul 16 09:15:14 2016 - [error][/usr/share/perl5/vendor_perl/MHA/MasterRotate.pm, ln573]  Failed!Sat Jul 16 09:15:14 2016 - [error][/usr/share/perl5/vendor_perl/MHA/MasterRotate.pm, ln602] Switching master to 192.168.0.60(192.168.0.60:3306) done, but switching slaves partially failed.[root@DBproxy masterha]#

通过主从机本身的日志判断可能是主从机中ip和主机名的未做映射导致的。修改hosts

主机的/etc/hosts127.0.0.1 MyDB01从机的/etc/hosts127.0.0.1 MyDB02修改后主从机器的/etc/hosts[root@MyDB02 ~]# more /etc/hosts192.168.0.60  MyDB02192.168.0.50  MyDB01

因之前的操作为完全成功，导致两台机器为双主架构。手动切换后调整为最初架构一主一从。在线切换前做一次检查：

[root@DBproxy app1]# masterha_check_repl --conf=/data/masterha/app1/app1.cnfSat Jul 16 10:24:49 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.Sat Jul 16 10:24:49 2016 - [info] Reading application default configuration from /data/masterha/app1/app1.cnf..Sat Jul 16 10:24:49 2016 - [info] Reading server configuration from /data/masterha/app1/app1.cnf..Sat Jul 16 10:24:49 2016 - [info] MHA::MasterMonitor version 0.56.Sat Jul 16 10:24:49 2016 - [info] GTID failover mode = 0Sat Jul 16 10:24:49 2016 - [info] Dead Servers:Sat Jul 16 10:24:49 2016 - [info] Alive Servers:Sat Jul 16 10:24:49 2016 - [info]   192.168.0.50(192.168.0.50:3306)Sat Jul 16 10:24:49 2016 - [info]   192.168.0.60(192.168.0.60:3306)Sat Jul 16 10:24:49 2016 - [info] Alive Slaves:Sat Jul 16 10:24:49 2016 - [info]   192.168.0.60(192.168.0.60:3306)  Version=5.6.29-log (oldest major version between slaves) log-bin:enabledSat Jul 16 10:24:49 2016 - [info]     Replicating from 192.168.0.50(192.168.0.50:3306)Sat Jul 16 10:24:49 2016 - [info]     Primary candidate for the new Master (candidate_master is set)Sat Jul 16 10:24:49 2016 - [info] Current Alive Master: 192.168.0.50(192.168.0.50:3306)Sat Jul 16 10:24:49 2016 - [info] Checking slave configurations..Sat Jul 16 10:24:49 2016 - [info]  read_only=1 is not set on slave 192.168.0.60(192.168.0.60:3306).Sat Jul 16 10:24:49 2016 - [info] Checking replication filtering settings..Sat Jul 16 10:24:49 2016 - [info]  binlog_do_db= , binlog_ignore_db= Sat Jul 16 10:24:49 2016 - [info]  Replication filtering check ok.Sat Jul 16 10:24:49 2016 - [info] GTID (with auto-pos) is not supportedSat Jul 16 10:24:49 2016 - [info] Starting SSH connection tests..Sat Jul 16 10:24:50 2016 - [info] All SSH connection tests passed successfully.Sat Jul 16 10:24:50 2016 - [info] Checking MHA Node version..Sat Jul 16 10:24:51 2016 - [info]  Version check ok.Sat Jul 16 10:24:51 2016 - [info] Checking SSH publickey authentication settings on the current master..Sat Jul 16 10:24:51 2016 - [info] HealthCheck: SSH to 192.168.0.50 is reachable.Sat Jul 16 10:24:51 2016 - [info] Master MHA Node version is 0.56.Sat Jul 16 10:24:51 2016 - [info] Checking recovery script configurations on 192.168.0.50(192.168.0.50:3306)..Sat Jul 16 10:24:51 2016 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql/3306/binlog --output_file=/data/masterha/app1/save_binary_logs_test --manager_version=0.56 --start_file=mysql-bin.000010 Sat Jul 16 10:24:51 2016 - [info]   Connecting to root@192.168.0.50(192.168.0.50:22)..   Creating /data/masterha/app1 if not exists..    ok.  Checking output directory is accessible or not..   ok.  Binlog found at /data/mysql/3306/binlog, up to mysql-bin.000010Sat Jul 16 10:24:52 2016 - [info] Binlog setting check done.Sat Jul 16 10:24:52 2016 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..Sat Jul 16 10:24:52 2016 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=192.168.0.60 --slave_ip=192.168.0.60 --slave_port=3306 --workdir=/data/masterha/app1 --target_version=5.6.29-log --manager_version=0.56 --relay_log_info=/data/mysql/3306/data/relay-log.info  --relay_dir=/data/mysql/3306/data/  --slave_pass=xxxSat Jul 16 10:24:52 2016 - [info]   Connecting to root@192.168.0.60(192.168.0.60:22)..   Checking slave recovery environment settings..    Opening /data/mysql/3306/data/relay-log.info ... ok.    Relay log found at /data/mysql/3306/binlog, up to relay-bin.000002    Temporary relay log file is /data/mysql/3306/binlog/relay-bin.000002    Testing mysql connection and privileges.. done.    Testing mysqlbinlog output.. done.    Cleaning up test file(s).. done.Sat Jul 16 10:24:53 2016 - [info] Slaves settings check done.Sat Jul 16 10:24:53 2016 - [info] 192.168.0.50(192.168.0.50:3306) (current master) +--192.168.0.60(192.168.0.60:3306)Sat Jul 16 10:24:53 2016 - [info] Checking replication health on 192.168.0.60..Sat Jul 16 10:24:53 2016 - [info]  ok.Sat Jul 16 10:24:53 2016 - [warning] master_ip_failover_script is not defined.Sat Jul 16 10:24:53 2016 - [warning] shutdown_script is not defined.Sat Jul 16 10:24:53 2016 - [info] Got exit code 0 (Not master dead).MySQL Replication Health is OK.

5.实施切换

[root@DBproxy app1]# masterha_master_switch --conf=/data/masterha/app1/app1.cnf --master_state=alive --new_master_host=192.168.0.60 --orig_master_is_new_slave --running_updates_limit=10000 --interactive=0Sat Jul 16 10:26:59 2016 - [info] MHA::MasterRotate version 0.56.Sat Jul 16 10:26:59 2016 - [info] Starting online master switch..Sat Jul 16 10:26:59 2016 - [info] Sat Jul 16 10:26:59 2016 - [info] * Phase 1: Configuration Check Phase..Sat Jul 16 10:26:59 2016 - [info] Sat Jul 16 10:26:59 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.Sat Jul 16 10:26:59 2016 - [info] Reading application default configuration from /data/masterha/app1/app1.cnf..Sat Jul 16 10:26:59 2016 - [info] Reading server configuration from /data/masterha/app1/app1.cnf..Sat Jul 16 10:26:59 2016 - [info] GTID failover mode = 0Sat Jul 16 10:26:59 2016 - [info] Current Alive Master: 192.168.0.50(192.168.0.50:3306)Sat Jul 16 10:26:59 2016 - [info] Alive Slaves:Sat Jul 16 10:26:59 2016 - [info]   192.168.0.60(192.168.0.60:3306)  Version=5.6.29-log (oldest major version between slaves) log-bin:enabledSat Jul 16 10:26:59 2016 - [info]     Replicating from 192.168.0.50(192.168.0.50:3306)Sat Jul 16 10:26:59 2016 - [info]     Primary candidate for the new Master (candidate_master is set)Sat Jul 16 10:26:59 2016 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..Sat Jul 16 10:26:59 2016 - [info]  ok.Sat Jul 16 10:26:59 2016 - [info] Checking MHA is not monitoring or doing failover..Sat Jul 16 10:26:59 2016 - [info] Checking replication health on 192.168.0.60..Sat Jul 16 10:26:59 2016 - [info]  ok.Sat Jul 16 10:26:59 2016 - [info] 192.168.0.60 can be new master.Sat Jul 16 10:26:59 2016 - [info] From:192.168.0.50(192.168.0.50:3306) (current master) +--192.168.0.60(192.168.0.60:3306)To:192.168.0.60(192.168.0.60:3306) (new master) +--192.168.0.50(192.168.0.50:3306)Sat Jul 16 10:26:59 2016 - [info] Checking whether 192.168.0.60(192.168.0.60:3306) is ok for the new master..Sat Jul 16 10:26:59 2016 - [info]  ok.Sat Jul 16 10:26:59 2016 - [info] 192.168.0.50(192.168.0.50:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host.Sat Jul 16 10:26:59 2016 - [info] 192.168.0.50(192.168.0.50:3306): Resetting slave pointing to the dummy host.Sat Jul 16 10:26:59 2016 - [info] ** Phase 1: Configuration Check Phase completed.Sat Jul 16 10:26:59 2016 - [info] Sat Jul 16 10:26:59 2016 - [info] * Phase 2: Rejecting updates Phase..Sat Jul 16 10:26:59 2016 - [info] Sat Jul 16 10:26:59 2016 - [warning] master_ip_online_change_script is not defined. Skipping disabling writes on the current master.Sat Jul 16 10:26:59 2016 - [info] Locking all tables on the orig master to reject updates from everybody (including root):Sat Jul 16 10:26:59 2016 - [info] Executing FLUSH TABLES WITH READ LOCK..Sat Jul 16 10:26:59 2016 - [info]  ok.Sat Jul 16 10:26:59 2016 - [info] Orig master binlog:pos is mysql-bin.000010:120.Sat Jul 16 10:26:59 2016 - [info]  Waiting to execute all relay logs on 192.168.0.60(192.168.0.60:3306)..Sat Jul 16 10:27:00 2016 - [info]  master_pos_wait(mysql-bin.000010:120) completed on 192.168.0.60(192.168.0.60:3306). Executed 0 events.Sat Jul 16 10:27:00 2016 - [info]   done.Sat Jul 16 10:27:00 2016 - [info] Getting new master's binlog name and position..Sat Jul 16 10:27:00 2016 - [info]  mysql-bin.000008:239Sat Jul 16 10:27:00 2016 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.0.60', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000008', MASTER_LOG_POS=239, MASTER_USER='repl', MASTER_PASSWORD='xxx';Sat Jul 16 10:27:00 2016 - [info] Sat Jul 16 10:27:00 2016 - [info] * Switching slaves in parallel..Sat Jul 16 10:27:00 2016 - [info] Sat Jul 16 10:27:00 2016 - [info] Unlocking all tables on the orig master:Sat Jul 16 10:27:00 2016 - [info] Executing UNLOCK TABLES..Sat Jul 16 10:27:00 2016 - [info]  ok.Sat Jul 16 10:27:00 2016 - [info] Starting orig master as a new slave..Sat Jul 16 10:27:00 2016 - [info]  Resetting slave 192.168.0.50(192.168.0.50:3306) and starting replication from the new master 192.168.0.60(192.168.0.60:3306)..Sat Jul 16 10:27:00 2016 - [info]  Executed CHANGE MASTER.Sat Jul 16 10:27:00 2016 - [info]  Slave started.Sat Jul 16 10:27:00 2016 - [info] All new slave servers switched successfully.Sat Jul 16 10:27:00 2016 - [info] Sat Jul 16 10:27:00 2016 - [info] * Phase 5: New master cleanup phase..Sat Jul 16 10:27:00 2016 - [info] Sat Jul 16 10:27:00 2016 - [info]  192.168.0.60: Resetting slave info succeeded.Sat Jul 16 10:27:00 2016 - [info] Switching master to 192.168.0.60(192.168.0.60:3306) completed successfully.[root@DBproxy app1]#

转载地址：http://rkoxx.baihongyu.com/

你可能感兴趣的文章

一个基于特征向量的近似网页去重算法——term用SVM人工提取训练，基于term的特征向量，倒排索引查询相似文档，同时利用cos计算相似度...