Thursday, September 9, 2010

veritas: Remove a node from a cluster without interruptions

Before making changes to the VERITAS Cluster Server (VCS) configuration, the main.cf file, make a good copy of the current main.cf.  In this example,  csvcs6 is removed from a two node cluster. Execute these commands on csvcs5,  the system not to be removed.









1.  Backup the configuration.


 # cp -p /etc/VRTSvcs/conf/config/main.cf    /etc/VRTSvcs/conf/config/main.cf.last_known.good


2. Check the current systems, group(s), and resource(s) status  


 # hastatus  -sum 
 -- SYSTEM STATE
 -- System               State                Frozen            

 A  csvcs5               RUNNING              0                    
 A  csvcs6               RUNNING              0                    

 -- GROUP STATE
 -- Group           System               Probed     AutoDisabled    State

 B  test_A          csvcs5               Y          N               ONLINE
 B  test_A          csvcs6               Y          N               OFFLINE
 B  test_B          csvcs6               Y          N               ONLINE
 B  wvcs            csvcs5               Y          N               OFFLINE
 B  wvcs            csvcs6               Y          N               ONLINE

Based on the outputs, csvcs5 and csvcs6 are the two nodes cluster.  Service group test_A and service group wvcs are configured to run on both nodes.  Service group test_B is configured to run on csvcs6 only.

Both service groups test_B and wvcs are online on csvcs6.  Now it is possible to failover service group wvcs to csvcs5 if it is to be online.

hagrp  -switch  -to

 # hagrp  -switch  wvcs  -to  csvcs5

3. Check for service group dependency

 # hagrp -dep
 Parent   Child     Relationship
 test_B    test_A    online global



4. Make VCS configuration writable

 # haconf -makerw


5. Unlink the group dependency if there is any.  In this case, the service group test_B requires test_A.

hagrp  -unlink    


 # hagrp  -unlink  test_B  test_A

6. Stop VCS on csvcs6, the node to be removed.

hastop  -sys  

 # hastop  -sys  csvcs6

7. Check the status again, making sure csvcs6 is EXITED and the failover service group is online on running node.


 # hastatus -sum
 -- SYSTEM STATE
 -- System               State                Frozen           

 A  csvcs5               RUNNING              0                 
 A  csvcs6               EXITED               0                 

 -- GROUP STATE
 -- Group           System               Probed     AutoDisabled    State

 B  test_A          csvcs5               Y          N               ONLINE
 B  test_A          csvcs6               Y          N               OFFLINE
 B  test_B          csvcs6               Y          N               OFFLINE
 B  wvcs            csvcs5               Y          N               ONLINE
 B  wvcs            csvcs6               Y          N               OFFLINE

8. Delete csvcs6 from wvcs and test_A  SystemList.

hagrp  -modify  SystemList  -delete  




 # hagrp  -modify  wvcs  SystemList  -delete  csvcs6
 # hagrp  -modify  test_A  SystemList  -delete  csvcs6

9. Check all the resources belonging to the service group and delete all the resources from group test_B before removing the group.
 

hagrp -resources 

 # hagrp  -resources  test_B
 jprocess
 kprocess

hares -delete



 # hares  -delete  jprocess
 # hares  -delete  kprocess

hagrp -delete



 # hagrp  -delete  test_B

10. Check the status again, making sure all the service groups are online on the other node.  In this case csvcs5.




 # hastatus -sum
 -- SYSTEM STATE
 -- System               State                Frozen            

 A  csvcs5               RUNNING              0                 
 A  csvcs6               EXITED               0                  

 -- GROUP STATE
 -- Group           System               Probed     AutoDisabled    State

 B  test_A          csvcs5               Y          N               ONLINE
 B  wvcs            csvcs5               Y          N               ONLINE

11. Delete system (node) from cluster, save the configuration, and make it read only.




 # hasys  -delete  csvcs6
 # haconf -dump -makero

12.  Depending on how the cluster is defined or the number of nodes in the cluster, it might be necessary to reduce the number for " /sbin/gabconfig -c -n # " in the  /etc/gabtab file on all the running nodes within the cluster.  If the # is larger than the number of nodes in the cluster, the GAB will not be auto seed.

To prevent VCS from starting after rebooting, do the following on the removed node (csvcs6):

1. Unconfigure and unload GAB


 # /sbin/gabconfig  -u
 # modunload -i `modinfo | grep gab | awk '{print $1}`
 

2. Unconfigure and unload LLT

 # /sbin/lltconfig  -U    
 # modunload -i `modinfo | grep llt | awk '{print $1}`



3. Prevent LLT, GAB and VCS from starting up in the future

 # mv  /etc/rc2.d/S70llt   /etc/rc2.d/s70llt
 # mv  /etc/rc2.d/S92gab   /etc/rc2.d/s92gab
 # mv  /etc/rc3.d/S99vcs   /etc/rc3.d/s99vcs


4. If  it ** is not **  desired to be running VCS on this particular node again,  all the VCS related packages and files can now be removed.



 # pkgrm VRTSperl
 # pkgrm VRTSvcs
 # pkgrm VRTSgab
 # pkgrm VRTSllt

 # rm /etc/llttab
 # rm /etc/gabtab

 
NOTE:  Due to the complexity and variation of VCS configuration, it is not possible to cover all the possible  situations and conditions of a cluster configuration in one technote.  The above steps are essential for common configuration in most VCS setups and provide some idea how to deal with complex setups. 



No comments:

Post a Comment