unix tutorials
unix security
unix help

Unix Tools

Testing Veritas Clusters

Actual commands are in black.

0. Check Veritas Licenses - for FileSystem, Volume Manager AND Cluster

vxlicense -p

If any licenses are not valid or expired -- get them FIXED before continuing! All licenses should say "No expiration". If ANY license has an actual expiration date, the test failed. Permenant licenses do NOT have an expiration date. Non-essential licenses may be moved -- however, a senior admin should do this.

1. Hand check SystemList & AutoStartList

On either machine:

grep SystemList /etc/VRTSvcs/conf/config/main.cf

system1

system2

grep AutoStartList /etc/VRTSvcs/conf/config/main.cf

system1

system2

Each list should contain both machines. If not, many of the next tests will fail.

system1

system2

system1

system2

2. Verify Cluster is Running

First verify that veritas is up & running:

hastatus -summary

vi /.profile

. /.profile

hastatus -summary

Here is the expected result (your SYSTEMs/GROUPs may vary):

One system should be OFFLINE and one system should be ONLINE ie:
# hastatus -summary

  -- SYSTEM STATE
  -- System               State                Frozen              

  A  e4500a               RUNNING              0                    
  A  e4500b               RUNNING              0                    

  -- GROUP STATE
  -- Group           System               Probed     AutoDisabled    State          

  B  oragrp          e4500a               Y          N               ONLINE         
  B  oragrp          e4500b               Y          N               OFFLINE

If your systems do not show the above status, try these debugging steps:

If NO systems are up, run hastart on both systems and run hastatus -summary again.
If only one system is shown, start other system with hastart. Note: one system should ALWAYS be OFFLINE for the way we configure systems here. (If we ran oracle parallel server, this could change -- but currently we run standard oracle server)

If both systems are up but are OFFLINE and hastart did NOT correct the problem and oracle filesystems are not running on either system, the cluster needs to be reset. (This happens under strange network situations with GE Access.) [You ran hastart and that wasn't enough to get full cluster to work.]

Verify that the systems have the following EXACT status (though your machine names will vary for other customers):

gedb002# hastatus -summary

-- SYSTEM STATE
-- System               State                Frozen              

A  gedb001              RUNNING              0                    
A  gedb002              RUNNING              0                    

-- GROUP STATE
-- Group           System               Probed     AutoDisabled    State        
  

B  oragrp          gedb001              Y          N               OFFLINE      
  
B  oragrp          gedb002              Y          N               OFFLINE      
  
gedb002#  hares -display | grep  ONLINE
nic-qfe3  State           gedb001   ONLINE
nic-qfe3  State           gedb002   ONLINE

gedb002# vxdg list
NAME         STATE           ID
rootdg       enabled  957265489.1025.gedb002

gedb001# vxdg list
NAME         STATE           ID
rootdg       enabled  957266358.1025.gedb001

Recovery Commands:

hastop -all

hastart

hastatus -summary

If none of these steps resolved the situation, contact Lorraine or Luke (possibly Russ Button or Jen Redman if they made it to Veritas Cluster class) or a Veritas Consultant.

3. Verify Services Can Switch Between Systems

Once, hastatus -summary works, note the GROUP name used. Usually, it will be "oragrp", but the installer can use any name, so please determine it's name.

First check if group can switch back and forth. On the system that is running (system1), switch veritas to other system (system2):

hagrp -switch groupname -to system2

Watch failover with hastatus -summary. Once it is failed over, switch it back:

hagrp -switch groupname -to system1

4. Verify OTHER System Can Go Up & Down Smoothly For Maintanence

On system that is OFFLINE (should be system 2 at this point), reboot the computer.

ssh system2

/usr/sbin/shutdown -i6 -g0 -y

Make sure that the when the system comes up & is running after the reboot. That is, when the reboot is finished, the second system should say it is offline using hastatus.

hastatus -summary

Once this is done, hagrp -switch groupname -to system2 and repeat reboot for the other system

hagrp -switch groupname -to system2

ssh system1

/usr/sbin/shutdown -i6 -g0 -y

Verify that system1 is in cluster once rebooted

hastatus -summary

5. Test Actual Failover For System 2 (and pray db is okay)

To do this, we will kill off the listener process, which should force a failover. This test SHOULD be okay for the db (that is why we choose LISTENER) but there is a very small chance things will go wrong .. hence the "pray" part :).

On system that is online (should be system2), kill off ORACLE LISTENER Process

ps -ef | grep LISTENER

Output should be like:

  root  1415   600  0 20:43:58 pts/0    0:00 grep LISTENER
  oracle   831     1  0 20:27:06 ?        0:00 /apps/oracle/product/8.1.5/bin/tnslsnr LISTENER -inherit

kill -9 process-id

Failover will take a few minutes

You will note that system 2 is faulted -- and system 1 is now online

You need to CLEAR the fault before trying to fail back over.

hares -display | grep FAULT

hares -clear resource-name -sys faulted-system

6. Test Actual Failover For System 1 (and pray db is okay)

Now we do same thing for the other system first verify that the other system is NOT faulted

hastatus -summary

Now do the same thing on this system... To do this, we will kill off the listener process, which should force a failover.

On system that is online (should be system2), kill off ORACLE LISTENER Process

ps -ef | grep LISTENER

Output should be like:

  oracle   987     1  0 20:49:19 ?        0:00 /apps/oracle/product/8.1.5/bin/tnslsnr LISTENER -inherit
  root  1330   631  0 20:58:29 pts/0    0:00 grep LISTENER

kill -9 process-id

Failover will take a few minutes

You will note that system 1 is faulted -- and system 1 is now online

You need to CLEAR the fault before trying to fail back over.

hares -display | grep FAULT

hares -clear resource-name -sys faulted-system

Run:

hastatus -summary

to make sure everything is okay.

An excellent reference book for Veritas Clusters is:

Shared Data Clusters: Scaleable, Manageable, and Highly Available Systems (VERITAS Series) - this will provide you with the background knowledge of how clisters work, how to get the best performance from your cluster, and more.

Veritas Software


◦	Veritas File Systems

◦	Veritas Volume Manager Overview

◦	Veritas Cluster Overview

◦	Vertitas Cluster Install

◦	Veritas Cluster Debugging Tips

◦	Testing Veritas Clusters

◦	Veritas Links

Perl & CGI

SQL

Unix Primer

Unix Help

Unix Security

About

Veritas Software Veritas Volume Manager Overview Veritas Cluster Overview Vertitas Cluster Install Veritas Cluster Debugging Tips Testing Veritas Clusters Veritas Links Unix Tools Home Unix Tutorials HTML & Javascript SQL Unix Primer Unix Help Unix Security About Articles Site Map

Unix Tutorials | Unix Security
www.UnixTools.com