Thursday, September 30, 2010
XSCF: Upgrade of XCP firmware
The steps will follow the preferred method from upgrading from 1060 firmware to 1093 firmware.The three major steps are :
- shut down to the ok prompt (init 0)
- XCP import in the system
- upgrade the XCP firmware - This will include an XSCF reset
- boot the system
- XCP import in the system
- upgrade the XCP firmware - This will include an XSCF reset
- boot the system
Note – XCP: Abbreviation for XSCF Control Package. XCP is a package that has the control programs of hardware that configures a computing system. The XSCF firmware and the OpenBoot PROM firmware are included in the XCP file. The firmware update functions provided by XSCF are used to manage XCP.
Firmware update using the XSCF Shell, Use the following commands to update the firmware:
■ getflashimage command: Imports firmware to this system.
■ flashupdate command: Downloads the firmware to flash memory and applies the XSCF firmware.
■ poweron command or reset command: Applies the OpenBoot PROM firmware.
■ version command: Displays the firmware version.
1. Once you have the system shutdown to the ok prompt , enter into the XSCF prompt.
2. Before updating the firmware, be sure to check the XCP version in the current system. Be aware of which version your upgrading from as steps will differ if there is a large version gap.
XSCF> version -c xcp -v
XSCF#0 (Active )
XCP0 (Current): 1060
OpenBoot PROM : 01.30.0000
XSCF : 01.06.0001
XCP1 (Reserve): 1060
OpenBoot PROM : 01.30.0000
XSCF : 01.06.0001
XSCF> getflashimage -l
Existing versions:
Version Size Date
FFXCP1060.tar.gz 49053148 Tue Feb 26 19:29:49 EST 2008
4. Use the following getflashimage command to specify the firmware program file and import XCP to the system.
XSCF> getflashimage -u user-name ftp://ip-address/FFXCP1093.tar.gz
Existing versions:
Version Size Date
FFXCP1060.tar.gz 49053148 Tue Feb 26 19:29:49 EST 2008
Warning: About to delete existing versions.
Continue? [y|n]: y
Removing FFXCP1060.tar.gz.
Password:
0MB received
1MB received
2MB received
.......
39MB received
40MB received
Download successful: 41859 Kbytes in 56 secs (784.888 Kbytes/sec)
Checking file...
MD5: f2dc08a4bd43061ea84c0172a6380c94
5. Confirm the list of the firmware program file you downloaded is now on the system using the getflashimage command.
XSCF> getflashimage -l
Existing versions:
Version Size Date
FFXCP1093.tar.gz 42863796 Thu Sep 23 14:09:40 EST 2010
6. Use the flashupdate command to confirm whether your able to update the new firmware version.
XSCF> flashupdate -c check -m xcp -s 1093
XCP update is possible with domains up
XCP update is possible with domains up
7. Use the flashupdate command to update the firmware. Once complete the the XSCF will reset and the current session will disconnect, connect again once the XSCF has been restored.
XSCF> flashupdate -c update -m xcp -s 1093
The XSCF will be reset. Continue? [y|n] :y
Checking the XCP image file, please wait a minute
XCP update is started (XCP version=1093:last version=1060)
OpenBoot PROM update is started (OpenBoot PROM version=02160000)
OpenBoot PROM update has been completed (OpenBoot PROM version=02160000)
XSCF update is started (XSCFU=0,bank=1,XCP version=1093:last version=1060)
XSCF download is started (XSCFU=0,bank=1,XCP version=1093:last version=1060, Firmware Element
ID=00:version=01090003:last version=01060000)
XSCF download has been completed (XSCFU=0,bank=1,XCP version=1093:last version=1060, Firmware
Element ID=00:version=01090003:last version=01060000)
XSCF download is started (XSCFU=0,bank=1,XCP version=1093:last version=1060, Firmware Element
ID=01:version=01090003:last version=01060001)
XSCF download has been completed (XSCFU=0,bank=1,XCP version=1093:last version=1060, Firmware
Element ID=01:version=01090003:last version=01060001)
XSCF download is started (XSCFU=0,bank=1,XCP version=1093:last version=1060, Firmware Element
ID=02:version=01080001:last version=01060000)
XSCF download has been completed (XSCFU=0,bank=1,XCP version=1093:last version=1060, Firmware
Element ID=02:version=01080001:last version=01060000)
XSCF download is started (XSCFU=0,bank=1,XCP version=1093:last version=1060, Firmware Element
ID=03:version=01090002:last version=01060000)
XSCF download has been completed (XSCFU=0,bank=1,XCP version=1093:last version=1060, Firmware
Element ID=03:version=01090002:last version=01060000)
XSCF download is started (XSCFU=0,bank=1,XCP version=1093:last version=1060, Firmware Element
ID=04:version=01090003:last version=01060001)
XSCF download has been completed (XSCFU=0,bank=1,XCP version=1093:last version=1060, Firmware
Element ID=04:version=01090003:last version=01060001)
XSCF download is started (XSCFU=0,bank=1,XCP version=1093:last version=1060, Firmware Element
ID=05:version=01090002:last version=01050000)
XSCF download has been completed (XSCFU=0,bank=1,XCP version=1093:last version=1060, Firmware
Element ID=05:version=01090002:last version=01050000)
XSCF download is started (XSCFU=0,bank=1,XCP version=1093:last version=1060, Firmware Element
ID=07:version=01090001:last version=01060000)
XSCF download has been completed (XSCFU=0,bank=1,XCP version=1093:last version=1060, Firmware
Element ID=07:version=01090001:last version=01060000)
XSCF update has been completed (XSCFU=0,bank=1,XCP version=1093:last version=1060)
XSCF is rebooting to update the reserve bank
8. Re-connect to the XSCF and log in again. To confirm that the XSCF firware update has finished, use the showlogs command with the monitor option. Ensure you see the "SCF:XCP update has been completed version=xxxx" message
XSCF> showlogs monitor
Sep 23 14:15:10 xscf1 monitor_msg: SCF:XCP update is started (XCP version=1093:last version=1060)
Sep 23 14:15:49 xscf1 monitor_msg: SCF:XSCF download is started (XSCFU=0, bank=1, XCP
version=1093:last version=1060, Firmware Element ID=00, version=01090003:last version=01060000)
Sep 23 14:16:28 xscf1 monitor_msg: SCF:XSCF download has been completed (XSCFU=0, bank=1, XCP
version=1093:last version=1060, Firmware Element ID=00, version=01090003:last version=01060000)
Sep 23 14:16:41 xscf1 monitor_msg: SCF:XSCF download is started (XSCFU=0, bank=1, XCP
version=1093:last version=1060, Firmware Element ID=01, version=01090003:last version=01060001)
.......
Sep 23 14:32:55 xscf1 monitor_msg: SCF:XCP update has been completed (XCP version=1093)
9. Confirm the version of the system firmware that is running is that of the firmware applied.
XSCF> version -c xcp -v
XSCF#0 (Active )
XCP0 (Reserve): 1093
OpenBoot PROM : 02.16.0000
XSCF : 01.09.0003
XCP1 (Current): 1093
OpenBoot PROM : 02.16.0000
XSCF : 01.09.0003
OpenBoot PROM BACKUP
#0: 01.30.0000
#1: 02.16.0000
10. To complete the update restart the domain. Once the domain is running it will commence its boot sequence.
XSCF> reset -d 0 por
DomainID to reset:00
Continue? [y|n] :y
00 :Reset
XSCF> showdomainstatus -a
DID Domain Status
00 Initialization Phase
01 -
02 -
03 -
XSCF> showdomainstatus -a
DID Domain Status
00 Running
01 -
02 -
03 -
Tuesday, September 28, 2010
timezones: Daylight Savings
Daylight Savings for this year starts on Sunday October 3. Are your systems ready to handle the time change automatically?
# zdump -v $TZ | grep 2010
Australia/NSW Thu Sep 23 05:08:57 2010 UTC = Thu Sep 23 15:08:57 2010 EST isdst=0
Australia/NSW Sat Apr 3 15:59:59 2010 UTC = Sun Apr 4 02:59:59 2010 EST isdst=1
Australia/NSW Sat Apr 3 16:00:00 2010 UTC = Sun Apr 4 02:00:00 2010 EST isdst=0
Australia/NSW Sat Oct 2 15:59:59 2010 UTC = Sun Oct 3 01:59:59 2010 EST isdst=0
Australia/NSW Sat Oct 2 16:00:00 2010 UTC = Sun Oct 3 03:00:00 2010 EST isdst=1
The first line is just the current date/time and can be ignored. the remaining entries show the entries from the time zone database, which determine when any time changes for this year will occur. The above shows that the system clock will go forward by 1 hour at 2:00 on Sunday Oct 3, which is correct.
Labels:
timezones
Monday, September 27, 2010
veritas: Backing up the Veritas Cluster Server configuration
Veritas cluster server stores custom agents and it’s configuration data as a series of files in /etc, /etc/VRTSvcs/conf/config and /opt/VRTSvcs/bin/ directories. Since these files are the life blood of the cluster engine, it is important to backup these files to ensure cluster recovery should disaster hit. VCS comes with the hasnap utility to simplify cluster configuration backups, and when run with the “-backup,” “-n,” “-f ,” and “-m ” options, a point in time snapshot of the cluster configuration will be written to the file passed to the “-f” option:
# hasnap -backup -f clusterbackup.zip -n -m “Backup from March 25th 2007″
Starting Configuration Backup for Cluster foo
Dumping the configuration...
Registering snapshot "foo-2006.08.25-1156511358610"
Contacting host lnode1...
Error connecting to the remote host "lnode1"
Starting backup of files on host lnode2
"/etc/VRTSvcs/conf/config/types.cf" ----> 1.0
"/etc/VRTSvcs/conf/config/main.cf" ----> 1.0
"/etc/VRTSvcs/conf/config/vcsApacheTypes.cf" ----> 1.0
"/etc/llthosts" ----> 1.0
"/etc/gabtab" ----> 1.0
"/etc/llttab" ----> 1.0
"/opt/VRTSvcs/bin/vcsenv" ----> 1.0
"/opt/VRTSvcs/bin/LVMVolumeGroup/monitor" ----> 1.0
"/opt/VRTSvcs/bin/LVMVolumeGroup/offline" ----> 1.0
"/opt/VRTSvcs/bin/LVMVolumeGroup/online" ----> 1.0
"/opt/VRTSvcs/bin/LVMVolumeGroup/clean" ----> 1.0
"/opt/VRTSvcs/bin/ScriptAgent" ----> 1.0
"/opt/VRTSvcs/bin/LVMVolumeGroup/LVMVolumeGroup.xml" ----> 1.0
"/opt/VRTSvcs/bin/RVGSnapshot/fdsched" ----> 1.0
"/opt/VRTSvcs/bin/RVGSnapshot/monitor" ----> 1.0
"/opt/VRTSvcs/bin/RVGSnapshot/fdsetup.vxg" ----> 1.0
"/opt/VRTSvcs/bin/RVGSnapshot/open" ----> 1.0
"/opt/VRTSvcs/bin/ScriptAgent" ----> 1.0
"/opt/VRTSvcs/bin/RVGSnapshot/RVGSnapshotAgent.pm" ----> 1.0
"/opt/VRTSvcs/bin/RVGSnapshot/RVGSnapshot.xml" ----> 1.0
"/opt/VRTSvcs/bin/RVGSnapshot/offline" ----> 1.0
"/opt/VRTSvcs/bin/RVGSnapshot/online" ----> 1.0
"/opt/VRTSvcs/bin/RVGSnapshot/attr_changed" ----> 1.0
"/opt/VRTSvcs/bin/RVGSnapshot/clean" ----> 1.0
"/opt/VRTSvcs/bin/RVGPrimary/monitor" ----> 1.0
"/opt/VRTSvcs/bin/RVGPrimary/open" ----> 1.0
"/opt/VRTSvcs/bin/RVGPrimary/RVGPrimary.xml" ----> 1.0
"/opt/VRTSvcs/bin/RVGPrimary/offline" ----> 1.0
"/opt/VRTSvcs/bin/RVGPrimary/online" ----> 1.0
"/opt/VRTSvcs/bin/RVGPrimary/clean" ----> 1.0
"/opt/VRTSvcs/bin/ScriptAgent" ----> 1.0
"/opt/VRTSvcs/bin/RVGPrimary/actions/fbsync" ----> 1.0
"/opt/VRTSvcs/bin/triggers/violation" ----> 1.0
"/opt/VRTSvcs/bin/CampusCluster/monitor" ----> 1.0
"/opt/VRTSvcs/bin/CampusCluster/close" ----> 1.0
"/opt/VRTSvcs/bin/ScriptAgent" ----> 1.0
"/opt/VRTSvcs/bin/CampusCluster/open" ----> 1.0
"/opt/VRTSvcs/bin/CampusCluster/CampusCluster.xml" ----> 1.0
"/opt/VRTSvcs/bin/RVG/monitor" ----> 1.0
"/opt/VRTSvcs/bin/RVG/info" ----> 1.0
"/opt/VRTSvcs/bin/ScriptAgent" ----> 1.0
"/opt/VRTSvcs/bin/RVG/RVG.xml" ----> 1.0
"/opt/VRTSvcs/bin/RVG/offline" ----> 1.0
"/opt/VRTSvcs/bin/RVG/online" ----> 1.0
"/opt/VRTSvcs/bin/RVG/clean" ----> 1.0
"/opt/VRTSvcs/bin/internal_triggers/cpuusage" ----> 1.0
Backup of files on host lnode2 complete
Backup succeeded partially
# hasnap -backup -f clusterbackup.zip -n -m “Backup from March 25th 2007″
Starting Configuration Backup for Cluster foo
Dumping the configuration...
Registering snapshot "foo-2006.08.25-1156511358610"
Contacting host lnode1...
Error connecting to the remote host "lnode1"
Starting backup of files on host lnode2
"/etc/VRTSvcs/conf/config/types.cf" ----> 1.0
"/etc/VRTSvcs/conf/config/main.cf" ----> 1.0
"/etc/VRTSvcs/conf/config/vcsApacheTypes.cf" ----> 1.0
"/etc/llthosts" ----> 1.0
"/etc/gabtab" ----> 1.0
"/etc/llttab" ----> 1.0
"/opt/VRTSvcs/bin/vcsenv" ----> 1.0
"/opt/VRTSvcs/bin/LVMVolumeGroup/monitor" ----> 1.0
"/opt/VRTSvcs/bin/LVMVolumeGroup/offline" ----> 1.0
"/opt/VRTSvcs/bin/LVMVolumeGroup/online" ----> 1.0
"/opt/VRTSvcs/bin/LVMVolumeGroup/clean" ----> 1.0
"/opt/VRTSvcs/bin/ScriptAgent" ----> 1.0
"/opt/VRTSvcs/bin/LVMVolumeGroup/LVMVolumeGroup.xml" ----> 1.0
"/opt/VRTSvcs/bin/RVGSnapshot/fdsched" ----> 1.0
"/opt/VRTSvcs/bin/RVGSnapshot/monitor" ----> 1.0
"/opt/VRTSvcs/bin/RVGSnapshot/fdsetup.vxg" ----> 1.0
"/opt/VRTSvcs/bin/RVGSnapshot/open" ----> 1.0
"/opt/VRTSvcs/bin/ScriptAgent" ----> 1.0
"/opt/VRTSvcs/bin/RVGSnapshot/RVGSnapshotAgent.pm" ----> 1.0
"/opt/VRTSvcs/bin/RVGSnapshot/RVGSnapshot.xml" ----> 1.0
"/opt/VRTSvcs/bin/RVGSnapshot/offline" ----> 1.0
"/opt/VRTSvcs/bin/RVGSnapshot/online" ----> 1.0
"/opt/VRTSvcs/bin/RVGSnapshot/attr_changed" ----> 1.0
"/opt/VRTSvcs/bin/RVGSnapshot/clean" ----> 1.0
"/opt/VRTSvcs/bin/RVGPrimary/monitor" ----> 1.0
"/opt/VRTSvcs/bin/RVGPrimary/open" ----> 1.0
"/opt/VRTSvcs/bin/RVGPrimary/RVGPrimary.xml" ----> 1.0
"/opt/VRTSvcs/bin/RVGPrimary/offline" ----> 1.0
"/opt/VRTSvcs/bin/RVGPrimary/online" ----> 1.0
"/opt/VRTSvcs/bin/RVGPrimary/clean" ----> 1.0
"/opt/VRTSvcs/bin/ScriptAgent" ----> 1.0
"/opt/VRTSvcs/bin/RVGPrimary/actions/fbsync" ----> 1.0
"/opt/VRTSvcs/bin/triggers/violation" ----> 1.0
"/opt/VRTSvcs/bin/CampusCluster/monitor" ----> 1.0
"/opt/VRTSvcs/bin/CampusCluster/close" ----> 1.0
"/opt/VRTSvcs/bin/ScriptAgent" ----> 1.0
"/opt/VRTSvcs/bin/CampusCluster/open" ----> 1.0
"/opt/VRTSvcs/bin/CampusCluster/CampusCluster.xml" ----> 1.0
"/opt/VRTSvcs/bin/RVG/monitor" ----> 1.0
"/opt/VRTSvcs/bin/RVG/info" ----> 1.0
"/opt/VRTSvcs/bin/ScriptAgent" ----> 1.0
"/opt/VRTSvcs/bin/RVG/RVG.xml" ----> 1.0
"/opt/VRTSvcs/bin/RVG/offline" ----> 1.0
"/opt/VRTSvcs/bin/RVG/online" ----> 1.0
"/opt/VRTSvcs/bin/RVG/clean" ----> 1.0
"/opt/VRTSvcs/bin/internal_triggers/cpuusage" ----> 1.0
Backup of files on host lnode2 complete
Backup succeeded partially
To check the contents of the snapshot, the unzip utility can be run with the “-t” option:
# unzip -t clusterbackup.zip |more
Archive: clusterbackup.zip
testing: /cat_vcs.zip OK
testing: /categorylist.xml.zip OK
testing: _repository__data/vcs/foo/lnode2/etc/VRTSvcs/conf/config/types.cf.zip OK
testing: _repository__data/vcs/foo/lnode2/etc/VRTSvcs/conf/config/main.cf.zip OK
testing: _repository__data/vcs/foo/lnode2/etc/VRTSvcs/conf/config/vcsApacheTypes.cf.z ip OK
testing: _repository__data/vcs/foo/lnode2/etc/llthosts.zip OK
testing: _repository__data/vcs/foo/lnode2/etc/gabtab.zip OK
testing: _repository__data/vcs/foo/lnode2/etc/llttab.zip OK
testing: _repository__data/vcs/foo/lnode2/opt/VRTSvcs/bin/vcsenv.zip OK
testing: _repository__data/vcs/foo/lnode2/opt/VRTSvcs/bin/LVMVolumeGroup/monitor.zip OK
testing: _repository__data/vcs/foo/lnode2/opt/VRTSvcs/bin/LVMVolumeGroup/offline.zip OK
testing: _repository__data/vcs/foo/lnode2/opt/VRTSvcs/bin/LVMVolumeGroup/online.zip OK
testing: _repository__data/vcs/foo/lnode2/opt/VRTSvcs/bin/LVMVolumeGroup/clean.zip OK
testing: _repository__data/vcs/foo/lnode2/opt/VRTSvcs/bin/LVMVolumeGroup/LVMVolumeGro upAgent.zip OK
testing: _repository__data/vcs/foo/lnode2/opt/VRTSvcs/bin/LVMVolumeGroup/LVMVolumeGro up.xml.zip OK
testing: _repository__data/vcs/foo/lnode2/opt/VRTSvcs/bin/RVGSnapshot/fdsched.zip O K
testing: _repository__data/vcs/foo/lnode2/opt/VRTSvcs/bin/RVGSnapshot/monitor.zip O K
......
# unzip -t clusterbackup.zip |more
Archive: clusterbackup.zip
testing: /cat_vcs.zip OK
testing: /categorylist.xml.zip OK
testing: _repository__data/vcs/foo/lnode2/etc/VRTSvcs/conf/config/types.cf.zip OK
testing: _repository__data/vcs/foo/lnode2/etc/VRTSvcs/conf/config/main.cf.zip OK
testing: _repository__data/vcs/foo/lnode2/etc/VRTSvcs/conf/config/vcsApacheTypes.cf.z ip OK
testing: _repository__data/vcs/foo/lnode2/etc/llthosts.zip OK
testing: _repository__data/vcs/foo/lnode2/etc/gabtab.zip OK
testing: _repository__data/vcs/foo/lnode2/etc/llttab.zip OK
testing: _repository__data/vcs/foo/lnode2/opt/VRTSvcs/bin/vcsenv.zip OK
testing: _repository__data/vcs/foo/lnode2/opt/VRTSvcs/bin/LVMVolumeGroup/monitor.zip OK
testing: _repository__data/vcs/foo/lnode2/opt/VRTSvcs/bin/LVMVolumeGroup/offline.zip OK
testing: _repository__data/vcs/foo/lnode2/opt/VRTSvcs/bin/LVMVolumeGroup/online.zip OK
testing: _repository__data/vcs/foo/lnode2/opt/VRTSvcs/bin/LVMVolumeGroup/clean.zip OK
testing: _repository__data/vcs/foo/lnode2/opt/VRTSvcs/bin/LVMVolumeGroup/LVMVolumeGro upAgent.zip OK
testing: _repository__data/vcs/foo/lnode2/opt/VRTSvcs/bin/LVMVolumeGroup/LVMVolumeGro up.xml.zip OK
testing: _repository__data/vcs/foo/lnode2/opt/VRTSvcs/bin/RVGSnapshot/fdsched.zip O K
testing: _repository__data/vcs/foo/lnode2/opt/VRTSvcs/bin/RVGSnapshot/monitor.zip O K
......
Since parts of the cluster configuration ran reside in memory and not on disk, it is a good idea to run “haconf -dump -makero” prior to running hasnap. This will ensure that the current configuration is being backed up, and will allow hasnap “-restore” to restore the correct configuration if disaster hits.
Friday, September 24, 2010
veritas: Veritas Volume Replicator (VVR) Commands
Here are some links to Basic Veritas Volume Replicator (VVR) commands for your data replication management environments.
Basic VVR Commands:
Basic VVR Commands:
Monday, September 20, 2010
veritas: rlinks in 'paused due to network disconnection' state
# vradmin -g dgprdap2 repstatus rvg_prdap2
Replicated Data Set: rvg_prdap2
Primary:
Host name: 10.150.150.13
RVG name: rvg_prdap2
DG name: dgprdap2
RVG state: enabled for I/O
Data volumes: 1
VSets: 0
SRL name: srl_prdap2
SRL size: 25.00 G
Total secondaries: 1
Secondary:
Host name: 10.150.150.16
RVG name: rvg_prdap2
DG name: dgprdap2
Data status: consistent, behind
Replication status: paused due to network disconnection
Current mode: asynchronous
Logging to: SRL (213278 Kbytes behind, 0% full)
Timestamp Information: behind by 2668h 22m 7s
'vradmin repstatus' shows rlinks in 'paused due to network disconnection' state
Various attempts where made to try continue/attach or resume the replication but failed.
The following checks must be made to diagnose the problem.
Run the vrport command on both the primary and secondary nodes.
# /usr/sbin/vrport
Check communication on these ports, from primary to secondary and vice versa on each node:
# ping -p port-number host-name
If the ping command succeeds, try restarting the VVR daemons (both on primary and secondary) in the correct sequence:
Stop vradmin on secondary then on primary
# /usr/sbin/vxstart_vvr stop
Start vradmin on secondary then on primary
# /usr/sbin/vxstart_vvr start
If for any unknown reason the above does not fix the issue you might need to try to restart the primary and secondary nodes. But if you encounter that a bounce does not fix issue it's a rare possibility that you might need to re-create the rvg's as it was in my case. Another unsolved VVR mystery.
Friday, September 17, 2010
veritas: Displaying and changing the ports used by VVR
Use the vrport command to display, change or set the port numbers used by VVR. You may have to change the port numbers in the following cases:
- To resolve a port number conflict with other applications.
- To configure VVR to work in your firewall environment.
- To configure VVR to work in your firewall environment when using UDP; to specify a restricted number of ports to replicate data between the Primary and the Secondary.
Port Used For Heartbeats
Use the vrport heartbeat command to display the port number used by VVR, for heartbeats. To change the heartbeat port number on a host, specify the port number with the vrport heartbeat command.
Use the vradmin changeip command to update the RLINKs with the new port information, and then restart the vxnetd daemon on the required system for the changes to take effect.
To display the port number used for heartbeats
# vrport heartbeat
To change the port number used for heartbeats
# vrport heartbeat port
This example shows how to change the replication heartbeat port on the host1. Follow the same steps to change the heartbeat port on secondary (host2).
3. Verify the changes to the local RLINK by issuing the following command on the required host:
4. Stop the vxnetd daemon.
5. Restart the vxnetd daemon.
Note:
|
To change the replication heartbeat port on host1 from 4145 to 5000
1. Use the vrport command to change the heartbeat port to 5000 on the required host.
2. Issue the vradmin changeip command without the newpri and newsec attributes.
1. Use the vrport command to change the heartbeat port to 5000 on the required host.
# vrport heartbeat 5000
2. Issue the vradmin changeip command without the newpri and newsec attributes.
# vradmin -g hrdg changeip hr_rvg host2
3. Verify the changes to the local RLINK by issuing the following command on the required host:
# vxprint -g hrdg -l rlk_host2_hr_rvg
4. Stop the vxnetd daemon.
# /usr/sbin/vxnetd stop
5. Restart the vxnetd daemon.
# /usr/sbin/vxnetd
Port Used By vramind
To display the port number used by vradmind, use the vrport vradmind command. To change the vradmind port, specify the port number with the vrport vradmind command.
To display the port number used by vradmind
To change the port number used by vradmind
To display the port number used by vradmind, use the vrport vradmind command. To change the vradmind port, specify the port number with the vrport vradmind command.
To display the port number used by vradmind
# vrport vradmind
To change the port number used by vradmind
# vrport vradmind port
Port Used by in.vxrysyncd
To display the port numbers used by in.vxrsyncd, use the vrport vxrsyncd command. To change the port numbers used by in.vxrsyncd, specify the port number with the vrport vxrsyncd command.
To display the port number used by in.vxrsyncd
To change the port number used by in.vxrsyncd
To display the port numbers used by in.vxrsyncd, use the vrport vxrsyncd command. To change the port numbers used by in.vxrsyncd, specify the port number with the vrport vxrsyncd command.
To display the port number used by in.vxrsyncd
# vrport vxrsyncd
To change the port number used by in.vxrsyncd
# vrport vxrsyncd port
To display the ports used to replicate data when using UDP, use the vrport data command. To change the ports used to replicate data when using UDP, specify the list of port numbers to use with the vrport data command.
Each RLINK requires one UDP port for replication. Specify an unused, reserved port number that is less than 32768 so that there is no port conflict with other applications. The number of ports specified must be equal to or greater than the number of RLINKs on the system.
Note:
|
To change ports used to replicate data when using UDP
For a system configured with multiple RLINKs, you can specify either a range of port numbers or a list of port numbers or both.
To specify a range of port numbers, use the following command:
# vrport data port1, port2, portlow-porthigh, ....
Note:
|
Wednesday, September 15, 2010
veritas: Panic System On Disk Group Loss
If you enable I/O fencing and set the DiskGroup attribute PanicSystemOnDGLoss to true, you'll get the desired failover behavior. The behavior you're seeing is by design and is intended to favor data integrity over availability.
The reason for halting the system is to ensure a failover takes place and there is no data corruption due to 2 hosts wanting to write to the shared storage.
I had experienced panic on one of my production servers and found the following example in the messages file:
2009/07/23 09:37:57 VCS CRITICAL V-16-10001-1073 (cluster2) DiskGroup:mydg:monitorisk Group: mydg is disabled on system: cluster2. System will panic to migrate all service groups to another VCS node in system list
2009/07/23 09:37:57 VCS CRITICAL V-16-10001-1073 (cluster2) DiskGroup:mydg:monitorisk Group: mydg is disabled on system: cluster2. System will panic to migrate all service groups to another VCS node in system list
By default the PanicSystemOnDGLoss attribute is set to 1 (true).
The attribute will cause VCS to panic the system on sudden loss of the diskgroup, when imported by VCS. The resource will also need to be marked as "Critical" for the panic to occur. VCS will not panic the system if the resource is not marked critical.
VCS will then perform an evacuation of the resource and related service group to the next surviving node. If the surviving node is unable to online the resource, no further panics are induced. The clean procedure is called and VCS stops trying to online the resource until the fault is cleared.
The attribute will cause VCS to panic the system on sudden loss of the diskgroup, when imported by VCS. The resource will also need to be marked as "Critical" for the panic to occur. VCS will not panic the system if the resource is not marked critical.
VCS will then perform an evacuation of the resource and related service group to the next surviving node. If the surviving node is unable to online the resource, no further panics are induced. The clean procedure is called and VCS stops trying to online the resource until the fault is cleared.
Monday, September 13, 2010
lsof: alloc: /: file system full
Have you seen this error?
Aug 9 00:17:41 server1 ufs: [ID 845546 kern.notice] NOTICE: alloc: /: file system full
When I looked at the top disk space consumers I found nothing useful.
# df -h | sort -rnk 5
/dev/md/dsk/d0 3.0G 2.9G 0K 100% /
/dev/md/dsk/d3 2.0G 1.5G 404M 80% /var
/dev/md/dsk/d30 469M 330M 93M 79% /opt
/dev/md/dsk/d6 992M 717M 215M 77% /home
/dev/md/dsk/d33 752M 494M 198M 72% /usr/local/install
[...]
After doing a du on whole filesystem I can see it is showing 2.5G only and df showing 2.9G consumed space.
# du -shd /
2.5G
2.5G
I realized few days back I came across same issue on ZFS filesystem hosting oracle DB and below understanding helped me there.
Normally, If filesystem is full, then look around in the directories that will be hidden by mounted filesystems in higher init states or see if any files that are eating up the disk space, in case if you get nothing useful from this exercise then one of the things to check is the open files and consider what has been cleaned up. Sometimes, if an open file is emptied or unlinked from the directory tree the disk space is not de-allocated until the owning process has been terminated or restarted. The result is an unexplainable loss of disk space. If this is the cause a reboot would clear it up. If you can't reboot consider any process that would be logging to that partition as a suspect and check all of your logs for any entries that imply rapid errors in a process.
In my case, reboot was not possible as the server caused file system full
# lsof +aL1 /
lsof WARNING: access /.lsof_server1: No such file or directory
lsof: WARNING: created device cache file: /.lsof_server1
lsof: WARNING: can't write to /.lsof_server1: No space left on device
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NLINK NODE NAME
scp 16472 root 4r VREG 85,0 238616064 0 119696 / (/dev/md/dsk/d0)
scp 22154 root 4r VREG 85,0 238213120 0 119677 / (/dev/md/dsk/d0)
``+L1'' will select open files that have been unlinked. A specification of the form ``+aL1 '' will select unlinked open files on the specified file system.
I got the processes ID's via lsof, after verifying the processes I killed them and suddenly it has released ~450MB space.
# df -kh | sort -rnk 5
/dev/md/dsk/d0 3.0G 2.5G 418M 86% /
/dev/md/dsk/d3 2.0G 1.5G 406M 80% /var
/dev/md/dsk/d30 469M 331M 91M 79% /opt
/dev/md/dsk/d6 992M 717M 215M 77% /home
/dev/md/dsk/d33 752M 494M 198M 72% /usr/local/install
Labels:
lsof
Thursday, September 9, 2010
veritas: Remove a node from a cluster without interruptions
Before making changes to the VERITAS Cluster Server (VCS) configuration, the main.cf file, make a good copy of the current main.cf. In this example, csvcs6 is removed from a two node cluster. Execute these commands on csvcs5, the system not to be removed.
1. Backup the configuration.
2. Check the current systems, group(s), and resource(s) status
# hastatus -sum
-- SYSTEM STATE
-- System State Frozen
A csvcs5 RUNNING 0
A csvcs6 RUNNING 0
-- GROUP STATE
-- Group System Probed AutoDisabled State
B test_A csvcs5 Y N ONLINE
B test_A csvcs6 Y N OFFLINE
B test_B csvcs6 Y N ONLINE
B wvcs csvcs5 Y N OFFLINE
B wvcs csvcs6 Y N ONLINE
Based on the outputs, csvcs5 and csvcs6 are the two nodes cluster. Service group test_A and service group wvcs are configured to run on both nodes. Service group test_B is configured to run on csvcs6 only.
Both service groups test_B and wvcs are online on csvcs6. Now it is possible to failover service group wvcs to csvcs5 if it is to be online.
hagrp -switch -to
3. Check for service group dependency
4. Make VCS configuration writable
5. Unlink the group dependency if there is any. In this case, the service group test_B requires test_A.
hagrp -unlink
6. Stop VCS on csvcs6, the node to be removed.
hastop -sys
# hastatus -sum
-- SYSTEM STATE
-- System State Frozen
A csvcs5 RUNNING 0
A csvcs6 EXITED 0
-- GROUP STATE
-- Group System Probed AutoDisabled State
B test_A csvcs5 Y N ONLINE
B test_A csvcs6 Y N OFFLINE
B test_B csvcs6 Y N OFFLINE
B wvcs csvcs5 Y N ONLINE
B wvcs csvcs6 Y N OFFLINE
hagrp -modify SystemList -delete
9. Check all the resources belonging to the service group and delete all the resources from group test_B before removing the group.
hagrp -resources
hares -delete
hagrp -delete
10. Check the status again, making sure all the service groups are online on the other node. In this case csvcs5.
11. Delete system (node) from cluster, save the configuration, and make it read only.
12. Depending on how the cluster is defined or the number of nodes in the cluster, it might be necessary to reduce the number for " /sbin/gabconfig -c -n # " in the /etc/gabtab file on all the running nodes within the cluster. If the # is larger than the number of nodes in the cluster, the GAB will not be auto seed.
To prevent VCS from starting after rebooting, do the following on the removed node (csvcs6):
1. Unconfigure and unload GAB
2. Unconfigure and unload LLT
3. Prevent LLT, GAB and VCS from starting up in the future
4. If it ** is not ** desired to be running VCS on this particular node again, all the VCS related packages and files can now be removed.
# pkgrm VRTSperl
# pkgrm VRTSvcs
# pkgrm VRTSgab
# pkgrm VRTSllt
# rm /etc/llttab
# rm /etc/gabtab
NOTE: Due to the complexity and variation of VCS configuration, it is not possible to cover all the possible situations and conditions of a cluster configuration in one technote. The above steps are essential for common configuration in most VCS setups and provide some idea how to deal with complex setups.
1. Backup the configuration.
# cp -p /etc/VRTSvcs/conf/config/main.cf /etc/VRTSvcs/conf/config/main.cf.last_known.good
2. Check the current systems, group(s), and resource(s) status
# hastatus -sum
-- SYSTEM STATE
-- System State Frozen
A csvcs5 RUNNING 0
A csvcs6 RUNNING 0
-- GROUP STATE
-- Group System Probed AutoDisabled State
B test_A csvcs5 Y N ONLINE
B test_A csvcs6 Y N OFFLINE
B test_B csvcs6 Y N ONLINE
B wvcs csvcs5 Y N OFFLINE
B wvcs csvcs6 Y N ONLINE
Based on the outputs, csvcs5 and csvcs6 are the two nodes cluster. Service group test_A and service group wvcs are configured to run on both nodes. Service group test_B is configured to run on csvcs6 only.
Both service groups test_B and wvcs are online on csvcs6. Now it is possible to failover service group wvcs to csvcs5 if it is to be online.
hagrp -switch
# hagrp -switch wvcs -to csvcs5
3. Check for service group dependency
# hagrp -dep
Parent Child Relationship
test_B test_A online global
Parent Child Relationship
test_B test_A online global
4. Make VCS configuration writable
# haconf -makerw
5. Unlink the group dependency if there is any. In this case, the service group test_B requires test_A.
hagrp -unlink
# hagrp -unlink test_B test_A
6. Stop VCS on csvcs6, the node to be removed.
hastop -sys
# hastop -sys csvcs6
7. Check the status again, making sure csvcs6 is EXITED and the failover service group is online on running node.
# hastatus -sum
-- SYSTEM STATE
-- System State Frozen
A csvcs5 RUNNING 0
A csvcs6 EXITED 0
-- GROUP STATE
-- Group System Probed AutoDisabled State
B test_A csvcs5 Y N ONLINE
B test_A csvcs6 Y N OFFLINE
B test_B csvcs6 Y N OFFLINE
B wvcs csvcs5 Y N ONLINE
B wvcs csvcs6 Y N OFFLINE
8. Delete csvcs6 from wvcs and test_A SystemList.
hagrp -modify
# hagrp -modify wvcs SystemList -delete csvcs6
# hagrp -modify test_A SystemList -delete csvcs6
# hagrp -modify test_A SystemList -delete csvcs6
# hagrp -resources test_B
jprocess
kprocess
jprocess
kprocess
# hares -delete jprocess
# hares -delete kprocess
# hares -delete kprocess
# hagrp -delete test_B
# hastatus -sum
-- SYSTEM STATE
-- System State Frozen
A csvcs5 RUNNING 0
A csvcs6 EXITED 0
-- GROUP STATE
-- Group System Probed AutoDisabled State
B test_A csvcs5 Y N ONLINE
B wvcs csvcs5 Y N ONLINE
-- SYSTEM STATE
-- System State Frozen
A csvcs5 RUNNING 0
A csvcs6 EXITED 0
-- GROUP STATE
-- Group System Probed AutoDisabled State
B test_A csvcs5 Y N ONLINE
B wvcs csvcs5 Y N ONLINE
11. Delete system (node) from cluster, save the configuration, and make it read only.
# hasys -delete csvcs6
# haconf -dump -makero
# haconf -dump -makero
12. Depending on how the cluster is defined or the number of nodes in the cluster, it might be necessary to reduce the number for " /sbin/gabconfig -c -n # " in the /etc/gabtab file on all the running nodes within the cluster. If the # is larger than the number of nodes in the cluster, the GAB will not be auto seed.
To prevent VCS from starting after rebooting, do the following on the removed node (csvcs6):
1. Unconfigure and unload GAB
# /sbin/gabconfig -u
# modunload -i `modinfo | grep gab | awk '{print $1}`
# modunload -i `modinfo | grep gab | awk '{print $1}`
# /sbin/lltconfig -U
# modunload -i `modinfo | grep llt | awk '{print $1}`
# modunload -i `modinfo | grep llt | awk '{print $1}`
# mv /etc/rc2.d/S70llt /etc/rc2.d/s70llt
# mv /etc/rc2.d/S92gab /etc/rc2.d/s92gab
# mv /etc/rc3.d/S99vcs /etc/rc3.d/s99vcs
# mv /etc/rc2.d/S92gab /etc/rc2.d/s92gab
# mv /etc/rc3.d/S99vcs /etc/rc3.d/s99vcs
4. If it ** is not ** desired to be running VCS on this particular node again, all the VCS related packages and files can now be removed.
# pkgrm VRTSperl
# pkgrm VRTSvcs
# pkgrm VRTSgab
# pkgrm VRTSllt
# rm /etc/llttab
# rm /etc/gabtab
NOTE: Due to the complexity and variation of VCS configuration, it is not possible to cover all the possible situations and conditions of a cluster configuration in one technote. The above steps are essential for common configuration in most VCS setups and provide some idea how to deal with complex setups.
Wednesday, September 8, 2010
controlling cpu usage part 9: Setting capped-cpu for a Zone
Introduced in Solaris 10 05/08 CPU caps allow a fine division of CPU resources. The administrator can allocate CPU resources in 1% increments of a single CPU. The allocation can go from 1% of a single CPU to the of CPU's in the system.
The capped-cpu resource type has a single property ncpus. This holds the amount of CPU allocation for the zone. It is expressed in units of a CPU, so 0.01 would be 1% of a single CPU, 1 would be 1CPU and 3.75 would be 3 3/4 CPU's.
I there are multiple CPU's in the system the allocation can come from any CPU, so multi-threaded code can still run threads in parallel if the scheduler so allocates.
However, unlike pools there is no dynamic balancing. If capped-cpu is enabled the CPU resources are statically divided. Unused CPU cycles in a zone are not available for other zones which have capped-cpu in effect.To set a zone to use 18% of a single CPU we would enter the following.
# zonecfg -z test0z1
zonecfg:test0z1> add capped-cpu
zonecfg:test0z1:capped-cpu> set ncpus=0.18
zonecfg:test0z1:capped-cpu> end
zonecfg:test0z1> exit
One important point to remember is that a single threaded process cannot utilize more than a single CPU irrespective of the value of the value of the capped-cpu resource.
You can check to see the performance of the capped-cpu zone using the prstat -Z command.
The percentage (of the global CPU resource) utilized by each zone will be listed with each zone in its summary line.
Labels:
Performance,
Tuning,
Zones
Monday, September 6, 2010
controlling cpu usage part 8: Adding Pools to a Zone
Zones are able to use the pool subsystem directly. When a zone is defined it can be associated with a named pool by setting the zone's pool property to the name of an existing pool.
# zonecfg -z zone set pool=pool_web
Multiple zones may share the same pool. In this case each zone should set the cpu-shares resource type to arbitrate between the relative use of CPU for each zone in the pool.
# zonecfg -z test0z1 set cpu-shares=20
# zonecfg -z test0z2 set cpu-shares=30
# zonecfg -z test0z2 set cpu-shares=30
Solaris 11/06 introduces the concept of anoymous pools. These are pools created by a zone when it boots for the exclusive use of that zone. This is done through the dedicated-cpu resource type for a zone. The dedicated-cpu resource type has two properties, ncpus which indicates the number CPU's to put into the created pool, or a range of CPU's if a dynamic pool is desired, and importance which sets the pool.importance property in the pool for use as tie-breaker by poold.
# zonecfg -z test0z1
zonecfg:test0z1> add dedicated-cpu
zonecfg:test0z1:dedicated-cpu> set ncpu=1-3
zonecfg:test0z1:dedicated-cpu> set importance=10
zonecfg:test0z1:dedicated-cpu> end
zonecfg:test0z1> commit
zonecfg:test0z1> exit
Whenever the zone boots the zoneadmd deamon will create a pool and assign the zone to the pool.
Note that the dedicated-cpu resource on a zone means that the pool cannot be shared between multiple zones.
# zonecfg -z test0z1
zonecfg:test0z1> set scheduling-class=FSS
zonecfg:test0z1> commit
zonecfg:test0z1> exit
Note The pools system must be already configured on the system before the dedicated-cpu resource type is used by the zone. If the pool system is not configured any attempt to boot the pool will result in an error from zoneadm.
If there is not enough resources to create the pool an attempt to boot results in a fatal error, and the boot fails.
# zoneadm -z test0z1 boot
zoneadm: zone 'test0z1': libpool(3LIB) error: invalid configuration
zoneadm: zone 'test0z1': dedicated-cpu setting cannot be instatiated
zoneadm: zone 'test0z1': call to zoneadmd failed
When the zone is booted a temporary pool called SUNWtmp_zonename is created.
pool SUNWtmp_test0z1
int pool.sys_id 4
boolean pool.active true
boolean pool.default false
int pool.importance 10
string pool.comment
boolean pool.temporary true
pset SUNWtmp_test0z1
pset SUNWtmp_test0z1
int pset.sys_id 1
boolean pset.defaul false
uint pset.min 1
uint pset.max 3
string pset.units population
uint pset.load 1991
uint pset.size 1
string pset.comment
boolean pset.temporary true
cpu
int cpu.sys_id 0
string cpu.comment
string cpu.status on-line
The dedicated-cpu resource type creates a pool for the exclusive use of this zone. The zone has exclusive access to the CPU's in the pool. For that reason the cpu-shares resource type in the zone has no meaning if a dedicated-cpu resource type is also defined. The zone will always have 100% of the shares in the processor set, and so will always have the entire processor set to itself irrespective of the number of shares.
Labels:
Performance,
pools,
Tuning,
Zones
Thursday, September 2, 2010
controlling cpu usage part 7: Pools
Dynamic pools were introduced in Solaris 10. A pool is binding of a resource and processor set together into a persistent entry. A pool allows us to name a pool and assign resource controls, such as the scheduler, on a persistent basis.
Pools can also be used for projects using the project.pool attribute in /etc/projectBy default if the pools system is enabled using SMF there is a default processor set created whcih is attached to a default pool. This configuration can be viewed using the poolcfg -dc info command.
# poolcfg -dc info
poolcfg: cannot load configuration from /dev/poolctl: Facility is not active
# svcadm enable svcs:/system/pools:default
# poolcfg -dc info
system default
string system.comment
int system.version 1
boolean system.bind-default true
string system.poold.objectives wt-load
pool pool_default
int pool.sys_id 0
boolean pool.active true
boolean pool.default true
int pool.importance 1
string pool.comment
pset pset_default
pset pset_default
int pset.sys_id -1
boolean pset.default true
uint pset.min 1
uint pset.max 65536
string pset.units population
uint pset.load 481
uint pset.size 2
string pset.comment
cpu
int cpu.sys_id 1
string cpu.comment
string cpu.status on-line
cpu
int cpu.sys_id 0
string cpu.comment
string cpu.status on-line
To configure pools on a system you must create a configuration file. By default this file should be named /etc/pooladm.conf so it would automatically load at boot time. The easiest way of creating a file is to configure the current system as desired and then perform a pooladm save command.
# pooladm -s /etc/pooladm.conf
The following example saves the current kernel state as /etc/pooladm.conf, and then uses poolcfg to create a new pool called pool_web which contains one processor set pset_web which has one CPU.
# pooladm -s /etc/pooladm.conf
# poolcfg -c 'create pool pool_web'
# poolcfg -c 'create pset pset_web (uint pset.min = 1; uint pset.max = 4)'
# poolcfg -c 'associate pool pool_web (pset pset_web)'
# pooladm -c
# pooladm -s
# poolcfg -c 'create pool pool_web'
# poolcfg -c 'create pset pset_web (uint pset.min = 1; uint pset.max = 4)'
# poolcfg -c 'associate pool pool_web (pset pset_web)'
# pooladm -c
# pooladm -s
We can then display the resultant condiguration.
# poolcfg -dc info
system default
string system.comment
int system.version 1
boolean system.bind-default true
string system.poold.objectives wt-load
pool pool_web
int pool.sys_id 1
boolean pool.active true
boolean pool.default false
int pool.importance 1
string pool.comment
pset pset_web
pool pool_default
int pool.sys_id 0
boolean pool.active true
boolean pool.default true
int pool.importance 1
string pool.comment
pset pset_default
pset pset_web
int pset.sys_id 1
boolean pset.default false
uint pset.min 1
uint pset.max 4
string pset.units population
uint pset.load 0
uint pset.size 1
string pset.comment
cpu
int cpu.sys_id 0
string cpu.comment
string cpu.status on-line
pset pset_default
int pset.sys_id -1
boolean pset.default true
uint pset.min 1
uint pset.max 65536
string pset.units population
uint pset.load 0
uint pset.size 1
string pset.comment
cpu
int cpu.sys_id 1
string cpu.comment
string cpu.status on-line
Note that the default pool and the default pset have their default property set to true.
We can also define other resource properties for the pools. To do this we can also define and set the pool.scheduler to the pool.
The following example sets the FSS scheduler for the pool pool_web# poolcfg -c 'modify pool pool_web (string pool.scheduler="FSS")'
# poolcfg -dc info
system default
string system.comment
int system.version 1
boolean system.bind-default true
string system.poold.objectives wt-load
pool pool_web
int pool.sys_id 1
boolean pool.active true
boolean pool.default false
string pool.scheduler FSS
int pool.importance 1
string pool.comment
pset pset_web
...
As the load in one processor set increases the number of CPU's in that pool is increased by taking CPU's from other pools. The pset.min and pset.max properties of the processor set are used to constrain the minimum and maximum number of CPU's that can exist in a pool.
If the there is a tie for resource the pool.importance property is used as a tie-breaker.
To enable dynamic pools the svc:/system/pools/dynamic:default service must be enabled. This will start the poold deamon which performs the dynamic modification of the processor sets on the system.
# ps -eaf|grep poold
root 20334 3948 0 12:23:51 pts/4 0:00 grep poold
# svcadm enable svc:/system/pools/dynamic:default
# ps -eaf|grep poold
root 20423 3948 0 12:24:55 pts/4 0:00 grep poold
root 20422 1 0 12:24:53 ? 0:00 /usr/lib/pool/poold
root 20334 3948 0 12:23:51 pts/4 0:00 grep poold
# svcadm enable svc:/system/pools/dynamic:default
# ps -eaf|grep poold
root 20423 3948 0 12:24:55 pts/4 0:00 grep poold
root 20422 1 0 12:24:53 ? 0:00 /usr/lib/pool/poold
Labels:
Performance,
pools,
Tuning