Thursday, September 30, 2010

XSCF: Upgrade of XCP firmware

This section explains how to update the firmware on Sun Sparc Enterpise M Series.
The steps will follow the preferred method from upgrading from 1060 firmware to 1093 firmware.The three major steps are :

- shut down to the ok prompt (init 0)
- XCP import in the system
- upgrade the XCP firmware - This will include an XSCF reset
- boot the system

Note – XCP: Abbreviation for XSCF Control Package. XCP is a package that has the control programs of hardware that configures a computing system. The XSCF firmware and the OpenBoot PROM firmware are included in the XCP file. The firmware update functions provided by XSCF are used to manage XCP.

Firmware update using the XSCF Shell, Use the following commands to update the firmware:
getflashimage command: Imports firmware to this system.
flashupdate command: Downloads the firmware to flash memory and applies the XSCF firmware.
poweron command or reset command: Applies the OpenBoot PROM firmware.
version command: Displays the firmware version.


1. Once you have the system shutdown to the ok prompt , enter into the XSCF prompt.

2. Before updating the firmware, be sure to check the XCP version in the current system. Be aware of which version your upgrading from as steps will differ if there is a large version gap.


 XSCF> version -c xcp -v
 XSCF#0 (Active )
 XCP0 (Current): 1060
 OpenBoot PROM : 01.30.0000
 XSCF          : 01.06.0001
 XCP1 (Reserve): 1060
 OpenBoot PROM : 01.30.0000
 XSCF          : 01.06.0001

3. Confirm the list of the firmware program files that are still on the system using the getflashimage command.


 XSCF> getflashimage -l
 Existing versions:
 Version                Size  Date
 FFXCP1060.tar.gz   49053148  Tue Feb 26 19:29:49 EST 2008

4. Use the following getflashimage command to specify the firmware program file and import XCP to the system.
Login a remote ftp server specifying the user name and host name that requires authentication password, then, import the new version firmware program (tar.gz). Ensure that your firmware program file is located in the home directory of the user your going to connect as. 


 XSCF> getflashimage -u user-name ftp://ip-address/FFXCP1093.tar.gz
 Existing versions:
 Version                Size  Date
 FFXCP1060.tar.gz   49053148  Tue Feb 26 19:29:49 EST 2008
 Warning: About to delete existing versions.
 Continue? [y|n]: y
 Removing FFXCP1060.tar.gz.
 Password:
   0MB received
   1MB received
   2MB received
 .......
  39MB received
  40MB received
 Download successful: 41859 Kbytes in 56 secs (784.888 Kbytes/sec)
 Checking file...
 MD5: f2dc08a4bd43061ea84c0172a6380c94

5. Confirm the list of the firmware program file you downloaded is now on the system using the getflashimage command.


 XSCF> getflashimage -l
 Existing versions:
 Version                Size  Date
 FFXCP1093.tar.gz   42863796  Thu Sep 23 14:09:40 EST 2010

6. Use the flashupdate command to confirm whether your able to update the new firmware version.

 XSCF> flashupdate -c check -m xcp -s 1093
 XCP update is possible with domains up

7. Use the flashupdate command to update the firmware. Once complete the the XSCF will reset and the current session will disconnect, connect again once the XSCF has been restored.


 XSCF> flashupdate -c update -m xcp -s 1093
 The XSCF will be reset. Continue? [y|n] :y
 Checking the XCP image file, please wait a minute
 XCP update is started (XCP version=1093:last version=1060)
 OpenBoot PROM update is started (OpenBoot PROM version=02160000)
 OpenBoot PROM update has been completed (OpenBoot PROM version=02160000)
 XSCF update is started (XSCFU=0,bank=1,XCP version=1093:last version=1060)
 XSCF download is started (XSCFU=0,bank=1,XCP version=1093:last version=1060, Firmware Element
 ID=00:version=01090003:last version=01060000)
 XSCF download has been completed (XSCFU=0,bank=1,XCP version=1093:last version=1060, Firmware
 Element ID=00:version=01090003:last version=01060000)
 XSCF download is started (XSCFU=0,bank=1,XCP version=1093:last version=1060, Firmware Element
 ID=01:version=01090003:last version=01060001)
 XSCF download has been completed (XSCFU=0,bank=1,XCP version=1093:last version=1060, Firmware
 Element ID=01:version=01090003:last version=01060001)
 XSCF download is started (XSCFU=0,bank=1,XCP version=1093:last version=1060, Firmware Element
 ID=02:version=01080001:last version=01060000)
 XSCF download has been completed (XSCFU=0,bank=1,XCP version=1093:last version=1060, Firmware
 Element ID=02:version=01080001:last version=01060000)
 XSCF download is started (XSCFU=0,bank=1,XCP version=1093:last version=1060, Firmware Element
 ID=03:version=01090002:last version=01060000)
 XSCF download has been completed (XSCFU=0,bank=1,XCP version=1093:last version=1060, Firmware
 Element ID=03:version=01090002:last version=01060000)
 XSCF download is started (XSCFU=0,bank=1,XCP version=1093:last version=1060, Firmware Element
 ID=04:version=01090003:last version=01060001)
 XSCF download has been completed (XSCFU=0,bank=1,XCP version=1093:last version=1060, Firmware
 Element ID=04:version=01090003:last version=01060001)
 XSCF download is started (XSCFU=0,bank=1,XCP version=1093:last version=1060, Firmware Element
 ID=05:version=01090002:last version=01050000)
 XSCF download has been completed (XSCFU=0,bank=1,XCP version=1093:last version=1060, Firmware
 Element ID=05:version=01090002:last version=01050000)
 XSCF download is started (XSCFU=0,bank=1,XCP version=1093:last version=1060, Firmware Element
 ID=07:version=01090001:last version=01060000)
 XSCF download has been completed (XSCFU=0,bank=1,XCP version=1093:last version=1060, Firmware
 Element ID=07:version=01090001:last version=01060000)
 XSCF update has been completed (XSCFU=0,bank=1,XCP version=1093:last version=1060)
 XSCF is rebooting to update the reserve bank

8. Re-connect to the XSCF and log in again. To confirm that the XSCF firware update has finished, use the showlogs command with the monitor option. Ensure you see the "SCF:XCP update has been completed version=xxxx" message



 XSCF> showlogs monitor
 Sep 23 14:15:10 xscf1 monitor_msg: SCF:XCP update is started (XCP version=1093:last version=1060)
 Sep 23 14:15:49 xscf1 monitor_msg: SCF:XSCF download is started (XSCFU=0, bank=1, XCP
 version=1093:last version=1060, Firmware Element ID=00, version=01090003:last version=01060000)
 Sep 23 14:16:28 xscf1 monitor_msg: SCF:XSCF download has been completed (XSCFU=0, bank=1, XCP
 version=1093:last version=1060, Firmware Element ID=00, version=01090003:last version=01060000)
 Sep 23 14:16:41 xscf1 monitor_msg: SCF:XSCF download is started (XSCFU=0, bank=1, XCP
 version=1093:last version=1060, Firmware Element ID=01, version=01090003:last version=01060001)
 .......
 Sep 23 14:32:55 xscf1 monitor_msg: SCF:XCP update has been completed (XCP version=1093)

9. Confirm the version of the system firmware that is running is that of the firmware applied.


 XSCF> version -c xcp -v
 XSCF#0 (Active )
 XCP0 (Reserve): 1093
 OpenBoot PROM : 02.16.0000
 XSCF          : 01.09.0003
 XCP1 (Current): 1093
 OpenBoot PROM : 02.16.0000
 XSCF          : 01.09.0003
 OpenBoot PROM BACKUP
 #0: 01.30.0000
 #1: 02.16.0000

10. To complete the update restart the domain.  Once the domain is running it will commence its boot sequence.


 XSCF> reset -d 0 por
 DomainID to reset:00
 Continue? [y|n] :y
 00 :Reset
 XSCF> showdomainstatus -a
 DID         Domain Status
 00          Initialization Phase
 01          -
 02          -
 03          -
 XSCF> showdomainstatus -a
 DID         Domain Status
 00          Running
 01          -
 02          -
 03          -

Tuesday, September 28, 2010

timezones: Daylight Savings

Daylight Savings for this year starts on Sunday October 3. Are your systems ready to handle the time change automatically?
To check any systems you may be interested in, the easiest way is using the zdump command








 # zdump -v $TZ | grep 2010

 Australia/NSW  Thu Sep 23 05:08:57 2010 UTC = Thu Sep 23 15:08:57 2010 EST isdst=0
 Australia/NSW  Sat Apr  3 15:59:59 2010 UTC = Sun Apr  4 02:59:59 2010 EST isdst=1
 Australia/NSW  Sat Apr  3 16:00:00 2010 UTC = Sun Apr  4 02:00:00 2010 EST isdst=0
 Australia/NSW  Sat Oct  2 15:59:59 2010 UTC = Sun Oct  3 01:59:59 2010 EST isdst=0
 Australia/NSW  Sat Oct  2 16:00:00 2010 UTC = Sun Oct  3 03:00:00 2010 EST isdst=1

The first line is just the current date/time and can be ignored. the remaining entries show the entries from the time zone database, which determine when any time changes for this year will occur. The above shows that the system clock will go forward by 1 hour at 2:00 on Sunday Oct 3, which is correct.


Monday, September 27, 2010

veritas: Backing up the Veritas Cluster Server configuration

Veritas cluster server stores custom agents and it’s configuration data as a series of files in /etc, /etc/VRTSvcs/conf/config and /opt/VRTSvcs/bin/ directories. Since these files are the life blood of the cluster engine, it is important to backup these files to ensure cluster recovery should disaster hit. VCS comes with the hasnap utility to simplify cluster configuration backups, and when run with the “-backup,” “-n,” “-f ,” and “-m ” options, a point in time snapshot of the cluster configuration will be written to the file passed to the “-f” option:








 # hasnap -backup -f clusterbackup.zip -n -m “Backup from March 25th 2007″ 

 Starting Configuration Backup for Cluster foo

 Dumping the configuration...

 Registering snapshot "foo-2006.08.25-1156511358610"

 Contacting host lnode1...

 Error connecting to the remote host "lnode1"

 Starting backup of files on host lnode2
 "/etc/VRTSvcs/conf/config/types.cf" ----> 1.0
 "/etc/VRTSvcs/conf/config/main.cf" ----> 1.0
 "/etc/VRTSvcs/conf/config/vcsApacheTypes.cf" ----> 1.0
 "/etc/llthosts" ----> 1.0
 "/etc/gabtab" ----> 1.0
 "/etc/llttab" ----> 1.0
 "/opt/VRTSvcs/bin/vcsenv" ----> 1.0
 "/opt/VRTSvcs/bin/LVMVolumeGroup/monitor" ----> 1.0
 "/opt/VRTSvcs/bin/LVMVolumeGroup/offline" ----> 1.0
 "/opt/VRTSvcs/bin/LVMVolumeGroup/online" ----> 1.0
 "/opt/VRTSvcs/bin/LVMVolumeGroup/clean" ----> 1.0
 "/opt/VRTSvcs/bin/ScriptAgent" ----> 1.0
 "/opt/VRTSvcs/bin/LVMVolumeGroup/LVMVolumeGroup.xml" ----> 1.0
 "/opt/VRTSvcs/bin/RVGSnapshot/fdsched" ----> 1.0
 "/opt/VRTSvcs/bin/RVGSnapshot/monitor" ----> 1.0
 "/opt/VRTSvcs/bin/RVGSnapshot/fdsetup.vxg" ----> 1.0
 "/opt/VRTSvcs/bin/RVGSnapshot/open" ----> 1.0
 "/opt/VRTSvcs/bin/ScriptAgent" ----> 1.0
 "/opt/VRTSvcs/bin/RVGSnapshot/RVGSnapshotAgent.pm" ----> 1.0
 "/opt/VRTSvcs/bin/RVGSnapshot/RVGSnapshot.xml" ----> 1.0
 "/opt/VRTSvcs/bin/RVGSnapshot/offline" ----> 1.0
 "/opt/VRTSvcs/bin/RVGSnapshot/online" ----> 1.0
 "/opt/VRTSvcs/bin/RVGSnapshot/attr_changed" ----> 1.0
 "/opt/VRTSvcs/bin/RVGSnapshot/clean" ----> 1.0
 "/opt/VRTSvcs/bin/RVGPrimary/monitor" ----> 1.0
 "/opt/VRTSvcs/bin/RVGPrimary/open" ----> 1.0
 "/opt/VRTSvcs/bin/RVGPrimary/RVGPrimary.xml" ----> 1.0
 "/opt/VRTSvcs/bin/RVGPrimary/offline" ----> 1.0
 "/opt/VRTSvcs/bin/RVGPrimary/online" ----> 1.0
 "/opt/VRTSvcs/bin/RVGPrimary/clean" ----> 1.0
 "/opt/VRTSvcs/bin/ScriptAgent" ----> 1.0
 "/opt/VRTSvcs/bin/RVGPrimary/actions/fbsync" ----> 1.0
 "/opt/VRTSvcs/bin/triggers/violation" ----> 1.0
 "/opt/VRTSvcs/bin/CampusCluster/monitor" ----> 1.0
 "/opt/VRTSvcs/bin/CampusCluster/close" ----> 1.0
 "/opt/VRTSvcs/bin/ScriptAgent" ----> 1.0
 "/opt/VRTSvcs/bin/CampusCluster/open" ----> 1.0
 "/opt/VRTSvcs/bin/CampusCluster/CampusCluster.xml" ----> 1.0
 "/opt/VRTSvcs/bin/RVG/monitor" ----> 1.0
 "/opt/VRTSvcs/bin/RVG/info" ----> 1.0
 "/opt/VRTSvcs/bin/ScriptAgent" ----> 1.0
 "/opt/VRTSvcs/bin/RVG/RVG.xml" ----> 1.0
 "/opt/VRTSvcs/bin/RVG/offline" ----> 1.0
 "/opt/VRTSvcs/bin/RVG/online" ----> 1.0
 "/opt/VRTSvcs/bin/RVG/clean" ----> 1.0
 "/opt/VRTSvcs/bin/internal_triggers/cpuusage" ----> 1.0

 Backup of files on host lnode2 complete

 Backup succeeded partially

To check the contents of the snapshot, the unzip utility can be run with the “-t” option:


 # unzip -t clusterbackup.zip |more
 Archive: clusterbackup.zip
 testing: /cat_vcs.zip OK
 testing: /categorylist.xml.zip OK
 testing: _repository__data/vcs/foo/lnode2/etc/VRTSvcs/conf/config/types.cf.zip OK
 testing: _repository__data/vcs/foo/lnode2/etc/VRTSvcs/conf/config/main.cf.zip OK
 testing: _repository__data/vcs/foo/lnode2/etc/VRTSvcs/conf/config/vcsApacheTypes.cf.z ip OK
 testing: _repository__data/vcs/foo/lnode2/etc/llthosts.zip OK
 testing: _repository__data/vcs/foo/lnode2/etc/gabtab.zip OK
 testing: _repository__data/vcs/foo/lnode2/etc/llttab.zip OK
 testing: _repository__data/vcs/foo/lnode2/opt/VRTSvcs/bin/vcsenv.zip OK
 testing: _repository__data/vcs/foo/lnode2/opt/VRTSvcs/bin/LVMVolumeGroup/monitor.zip OK
 testing: _repository__data/vcs/foo/lnode2/opt/VRTSvcs/bin/LVMVolumeGroup/offline.zip OK
 testing: _repository__data/vcs/foo/lnode2/opt/VRTSvcs/bin/LVMVolumeGroup/online.zip OK
 testing: _repository__data/vcs/foo/lnode2/opt/VRTSvcs/bin/LVMVolumeGroup/clean.zip OK
 testing: _repository__data/vcs/foo/lnode2/opt/VRTSvcs/bin/LVMVolumeGroup/LVMVolumeGro upAgent.zip OK
 testing: _repository__data/vcs/foo/lnode2/opt/VRTSvcs/bin/LVMVolumeGroup/LVMVolumeGro up.xml.zip OK
 testing: _repository__data/vcs/foo/lnode2/opt/VRTSvcs/bin/RVGSnapshot/fdsched.zip O K
 testing: _repository__data/vcs/foo/lnode2/opt/VRTSvcs/bin/RVGSnapshot/monitor.zip O K
 ......

Since parts of the cluster configuration ran reside in memory and not on disk, it is a good idea to run “haconf -dump -makero” prior to running hasnap. This will ensure that the current configuration is being backed up, and will allow hasnap “-restore” to restore the correct configuration if disaster hits.


Friday, September 24, 2010

veritas: Veritas Volume Replicator (VVR) Commands

Here are some links to Basic Veritas Volume Replicator (VVR) commands for your data replication management environments.

Basic VVR Commands:

Monday, September 20, 2010

veritas: rlinks in 'paused due to network disconnection' state



Had a case open with Symantec with the following issue.









 # vradmin -g dgprdap2 repstatus rvg_prdap2
 Replicated Data Set: rvg_prdap2
 Primary:
   Host name:                  10.150.150.13
   RVG name:                   rvg_prdap2
   DG name:                    dgprdap2
   RVG state:                  enabled for I/O
   Data volumes:               1
   VSets:                      0
   SRL name:                   srl_prdap2
   SRL size:                   25.00 G
   Total secondaries:          1

 Secondary:
   Host name:                  10.150.150.16
   RVG name:                   rvg_prdap2
   DG name:                    dgprdap2
   Data status:                consistent, behind
   Replication status:         paused due to network disconnection
   Current mode:               asynchronous
   Logging to:                 SRL (213278 Kbytes behind, 0% full)
   Timestamp Information:      behind by 2668h 22m 7s

 'vradmin repstatus' shows rlinks in 'paused due to network disconnection' state
Various attempts where made to try continue/attach or resume the replication but failed.  

The following checks must be made to diagnose the problem.
Run the vrport command on both the primary and secondary nodes.

# /usr/sbin/vrport

Check communication on these ports, from primary to secondary and vice versa on each node: 

# ping -p port-number host-name

If the ping command succeeds, try restarting the VVR daemons (both on primary and secondary) in the correct sequence: 
 
Stop vradmin on secondary then on primary

# /usr/sbin/vxstart_vvr stop

Start vradmin on secondary then on primary

# /usr/sbin/vxstart_vvr start 

If for any unknown reason the above does not fix the issue you might need to try to restart the primary and secondary nodes. But if you encounter that a bounce does not fix issue it's a rare possibility that you might need to re-create the rvg's as it was in my case. Another unsolved VVR mystery. 



Friday, September 17, 2010

veritas: Displaying and changing the ports used by VVR

Use the vrport command to display, change or set the port numbers used by VVR. You may have to change the port numbers in the following cases:
  • To resolve a port number conflict with other applications.
  • To configure VVR to work in your firewall environment.
  • To configure VVR to work in your firewall environment when using UDP; to specify a restricted number of ports to replicate data between the Primary and the Secondary.

Port Used For Heartbeats

Use the vrport heartbeat command to display the port number used by VVR, for heartbeats. To change the heartbeat port number on a host, specify the port number with the vrport heartbeat command.
Use the vradmin changeip command to update the RLINKs with the new port information, and then restart the vxnetd daemon on the required system for the changes to take effect.
To display the port number used for heartbeats

 # vrport heartbeat

To change the port number used for heartbeats

 # vrport heartbeat port
This example shows how to change the replication heartbeat port on the host1. Follow the same steps to change the heartbeat port on secondary (host2).
Note:
VVR supports a configuration with different heartbeat port numbers on the primary and secondary.

To change the replication heartbeat port on host1 from 4145 to 5000

 1.  Use the vrport command to change the heartbeat port to 5000 on the required host.

 # vrport heartbeat 5000

 2. Issue the vradmin changeip command without the newpri and newsec attributes.
 # vradmin -g hrdg changeip hr_rvg host2

 3. Verify the changes to the local RLINK by issuing the following command on the required host:
 # vxprint -g hrdg -l rlk_host2_hr_rvg

4. Stop the vxnetd daemon.
 # /usr/sbin/vxnetd stop

 5. Restart the vxnetd daemon.
 # /usr/sbin/vxnetd
Port Used By vramind

To display the port number used by vradmind, use the vrport vradmind command. To change the vradmind port, specify the port number with the vrport vradmind command.
To display the port number used by vradmind

 # vrport vradmind

To change the port number used by vradmind

 # vrport vradmind port

 Note:
You must restart the server vradmind for this change to take effect. Make sure you change the port number on all the hosts in the RDS.
Port Used by in.vxrysyncd

To display the port numbers used by in.vxrsyncd, use the vrport vxrsyncd command. To change the port numbers used by in.vxrsyncd, specify the port number with the vrport vxrsyncd command.
To display the port number used by in.vxrsyncd

 # vrport vxrsyncd

To change the port number used by in.vxrsyncd

 # vrport vxrsyncd port

Note:
You must restart the server in.vxrsyncd for this change to take effect. Make sure you change the port number on all the hosts in the RDS.
Ports Used To Replicate Date Using UDP

To display the ports used to replicate data when using UDP, use the vrport data command. To change the ports used to replicate data when using UDP, specify the list of port numbers to use with the vrport data command.
Each RLINK requires one UDP port for replication. Specify an unused, reserved port number that is less than 32768 so that there is no port conflict with other applications. The number of ports specified must be equal to or greater than the number of RLINKs on the system.
Note:
For systems using the TCP protocol for replication, you are not required to select any data port as the connection is established with the listener port on the remote host. The listener uses this port number which is numerically same as the UDP port used for heartbeat messages.
To display ports used to replicate data when using UDP

 # vrport data

To change ports used to replicate data when using UDP
For a system configured with one RLINK, use the following command:

 # vrport data port

For a system configured with multiple RLINKs, you can specify either a range of port numbers or a list of port numbers or both.
To specify a range of port numbers, use the following command:

 # vrport data port1, port2, portlow-porthigh, .... 

For example:

 # vrport data 3400, 3405, 3500-3503, 5756-5760

Note:
To use the new port information, execute /usr/sbin/vxnetd, and then pause and resume all RLINKs.

Wednesday, September 15, 2010

veritas: Panic System On Disk Group Loss

Be wary of the following DiskGroup resource attribute "PanicSystemOnDGLoss"

If you enable I/O fencing and set the DiskGroup attribute PanicSystemOnDGLoss to true, you'll get the desired failover behavior. The behavior you're seeing is by design and is intended to favor data integrity over availability. 

The reason for halting the system is to ensure a failover takes place and there is no data corruption due to 2 hosts wanting to write to the shared storage.

I had experienced panic on one of my production servers and found the following example in the messages file:


 2009/07/23 09:37:57 VCS CRITICAL V-16-10001-1073 (cluster2) DiskGroup:mydg:monitorisk Group: mydg is disabled on system: cluster2. System will panic to migrate all service groups to another VCS node in system list

By default the PanicSystemOnDGLoss attribute is set to 1 (true).

The attribute will cause VCS to panic the system on sudden loss of the diskgroup, when imported by VCS. The resource will also need to be marked as "Critical" for the panic to occur. VCS will not panic the system if the resource is not marked critical.

VCS will then perform an evacuation of the resource and related service group to the next surviving node. If the surviving node is unable to online the resource, no further panics are induced. The clean procedure is called and VCS stops trying to online the resource until the fault is cleared.