Thursday, September 9, 2010
veritas: Remove a node from a cluster without interruptions
Before making changes to the VERITAS Cluster Server (VCS) configuration, the main.cf file, make a good copy of the current main.cf. In this example, csvcs6 is removed from a two node cluster. Execute these commands on csvcs5, the system not to be removed.
1. Backup the configuration.
2. Check the current systems, group(s), and resource(s) status
# hastatus -sum
-- SYSTEM STATE
-- System State Frozen
A csvcs5 RUNNING 0
A csvcs6 RUNNING 0
-- GROUP STATE
-- Group System Probed AutoDisabled State
B test_A csvcs5 Y N ONLINE
B test_A csvcs6 Y N OFFLINE
B test_B csvcs6 Y N ONLINE
B wvcs csvcs5 Y N OFFLINE
B wvcs csvcs6 Y N ONLINE
Based on the outputs, csvcs5 and csvcs6 are the two nodes cluster. Service group test_A and service group wvcs are configured to run on both nodes. Service group test_B is configured to run on csvcs6 only.
Both service groups test_B and wvcs are online on csvcs6. Now it is possible to failover service group wvcs to csvcs5 if it is to be online.
hagrp -switch -to
3. Check for service group dependency
4. Make VCS configuration writable
5. Unlink the group dependency if there is any. In this case, the service group test_B requires test_A.
hagrp -unlink
6. Stop VCS on csvcs6, the node to be removed.
hastop -sys
# hastatus -sum
-- SYSTEM STATE
-- System State Frozen
A csvcs5 RUNNING 0
A csvcs6 EXITED 0
-- GROUP STATE
-- Group System Probed AutoDisabled State
B test_A csvcs5 Y N ONLINE
B test_A csvcs6 Y N OFFLINE
B test_B csvcs6 Y N OFFLINE
B wvcs csvcs5 Y N ONLINE
B wvcs csvcs6 Y N OFFLINE
hagrp -modify SystemList -delete
9. Check all the resources belonging to the service group and delete all the resources from group test_B before removing the group.
hagrp -resources
hares -delete
hagrp -delete
10. Check the status again, making sure all the service groups are online on the other node. In this case csvcs5.
11. Delete system (node) from cluster, save the configuration, and make it read only.
12. Depending on how the cluster is defined or the number of nodes in the cluster, it might be necessary to reduce the number for " /sbin/gabconfig -c -n # " in the /etc/gabtab file on all the running nodes within the cluster. If the # is larger than the number of nodes in the cluster, the GAB will not be auto seed.
To prevent VCS from starting after rebooting, do the following on the removed node (csvcs6):
1. Unconfigure and unload GAB
2. Unconfigure and unload LLT
3. Prevent LLT, GAB and VCS from starting up in the future
4. If it ** is not ** desired to be running VCS on this particular node again, all the VCS related packages and files can now be removed.
# pkgrm VRTSperl
# pkgrm VRTSvcs
# pkgrm VRTSgab
# pkgrm VRTSllt
# rm /etc/llttab
# rm /etc/gabtab
NOTE: Due to the complexity and variation of VCS configuration, it is not possible to cover all the possible situations and conditions of a cluster configuration in one technote. The above steps are essential for common configuration in most VCS setups and provide some idea how to deal with complex setups.
1. Backup the configuration.
# cp -p /etc/VRTSvcs/conf/config/main.cf /etc/VRTSvcs/conf/config/main.cf.last_known.good
2. Check the current systems, group(s), and resource(s) status
# hastatus -sum
-- SYSTEM STATE
-- System State Frozen
A csvcs5 RUNNING 0
A csvcs6 RUNNING 0
-- GROUP STATE
-- Group System Probed AutoDisabled State
B test_A csvcs5 Y N ONLINE
B test_A csvcs6 Y N OFFLINE
B test_B csvcs6 Y N ONLINE
B wvcs csvcs5 Y N OFFLINE
B wvcs csvcs6 Y N ONLINE
Based on the outputs, csvcs5 and csvcs6 are the two nodes cluster. Service group test_A and service group wvcs are configured to run on both nodes. Service group test_B is configured to run on csvcs6 only.
Both service groups test_B and wvcs are online on csvcs6. Now it is possible to failover service group wvcs to csvcs5 if it is to be online.
hagrp -switch
# hagrp -switch wvcs -to csvcs5
3. Check for service group dependency
# hagrp -dep
Parent Child Relationship
test_B test_A online global
Parent Child Relationship
test_B test_A online global
4. Make VCS configuration writable
# haconf -makerw
5. Unlink the group dependency if there is any. In this case, the service group test_B requires test_A.
hagrp -unlink
# hagrp -unlink test_B test_A
6. Stop VCS on csvcs6, the node to be removed.
hastop -sys
# hastop -sys csvcs6
7. Check the status again, making sure csvcs6 is EXITED and the failover service group is online on running node.
# hastatus -sum
-- SYSTEM STATE
-- System State Frozen
A csvcs5 RUNNING 0
A csvcs6 EXITED 0
-- GROUP STATE
-- Group System Probed AutoDisabled State
B test_A csvcs5 Y N ONLINE
B test_A csvcs6 Y N OFFLINE
B test_B csvcs6 Y N OFFLINE
B wvcs csvcs5 Y N ONLINE
B wvcs csvcs6 Y N OFFLINE
8. Delete csvcs6 from wvcs and test_A SystemList.
hagrp -modify
# hagrp -modify wvcs SystemList -delete csvcs6
# hagrp -modify test_A SystemList -delete csvcs6
# hagrp -modify test_A SystemList -delete csvcs6
# hagrp -resources test_B
jprocess
kprocess
jprocess
kprocess
# hares -delete jprocess
# hares -delete kprocess
# hares -delete kprocess
# hagrp -delete test_B
# hastatus -sum
-- SYSTEM STATE
-- System State Frozen
A csvcs5 RUNNING 0
A csvcs6 EXITED 0
-- GROUP STATE
-- Group System Probed AutoDisabled State
B test_A csvcs5 Y N ONLINE
B wvcs csvcs5 Y N ONLINE
-- SYSTEM STATE
-- System State Frozen
A csvcs5 RUNNING 0
A csvcs6 EXITED 0
-- GROUP STATE
-- Group System Probed AutoDisabled State
B test_A csvcs5 Y N ONLINE
B wvcs csvcs5 Y N ONLINE
11. Delete system (node) from cluster, save the configuration, and make it read only.
# hasys -delete csvcs6
# haconf -dump -makero
# haconf -dump -makero
12. Depending on how the cluster is defined or the number of nodes in the cluster, it might be necessary to reduce the number for " /sbin/gabconfig -c -n # " in the /etc/gabtab file on all the running nodes within the cluster. If the # is larger than the number of nodes in the cluster, the GAB will not be auto seed.
To prevent VCS from starting after rebooting, do the following on the removed node (csvcs6):
1. Unconfigure and unload GAB
# /sbin/gabconfig -u
# modunload -i `modinfo | grep gab | awk '{print $1}`
# modunload -i `modinfo | grep gab | awk '{print $1}`
# /sbin/lltconfig -U
# modunload -i `modinfo | grep llt | awk '{print $1}`
# modunload -i `modinfo | grep llt | awk '{print $1}`
# mv /etc/rc2.d/S70llt /etc/rc2.d/s70llt
# mv /etc/rc2.d/S92gab /etc/rc2.d/s92gab
# mv /etc/rc3.d/S99vcs /etc/rc3.d/s99vcs
# mv /etc/rc2.d/S92gab /etc/rc2.d/s92gab
# mv /etc/rc3.d/S99vcs /etc/rc3.d/s99vcs
4. If it ** is not ** desired to be running VCS on this particular node again, all the VCS related packages and files can now be removed.
# pkgrm VRTSperl
# pkgrm VRTSvcs
# pkgrm VRTSgab
# pkgrm VRTSllt
# rm /etc/llttab
# rm /etc/gabtab
NOTE: Due to the complexity and variation of VCS configuration, it is not possible to cover all the possible situations and conditions of a cluster configuration in one technote. The above steps are essential for common configuration in most VCS setups and provide some idea how to deal with complex setups.
No comments:
Post a Comment