Tuesday, August 31, 2010
controlling cpu usage part 6: Creating and Using Processor Sets
Processor sets extend the idea of CPU bindings to a more general relationship. With processor sets some number of CPU's are collected together into a set. These CPU's are effectively fenced from the rest of the system. Normal thread cannot use these CPU's. This is different to processor bindings, where the CPU's are still available for non-bound threads.
Processor sets should only be used on legacy systems that are currently using processor sets. All new installations should use pools, as they have greater flexibility.
The following example creates an empty processor set, assigns CPU id 0 to the newly created set, then binds the current shell to the newly created set.
We then query the processor sets for the details on bound process before destroying the bindings.
Finally the processor set itself is deleted.
# psrset -c
created processor set 1
# psrset -a 1 0
processor 0: was not assigned , now 1
# psrset -b 1 $$
process id 18219: was not bound, now 1
# psrset
user processor set 1: processor 0
# psrset -Q 1
process id 18329: 1
process id 18219: 1
# psrset -U
# psrset -Q 1
# psrset
user processor set 1: processor 0
# psrset -d 1
removed processor set 1
# psrset
created processor set 1
# psrset -a 1 0
processor 0: was not assigned , now 1
# psrset -b 1 $$
process id 18219: was not bound, now 1
# psrset
user processor set 1: processor 0
# psrset -Q 1
process id 18329: 1
process id 18219: 1
# psrset -U
# psrset -Q 1
# psrset
user processor set 1: processor 0
# psrset -d 1
removed processor set 1
# psrset
Labels:
Performance,
psrset,
Tuning
Friday, August 27, 2010
controlling cpu usage part 5: Binding a Process to a Processor
Processor Binding is the forced locking of a process onto a particular CPU. The nominated process, or threads within a process, are only excecuted by the specified CPU. All process binding is performed through the pbind command. To bind all the threads in a process the pbind command is called with the -b option and the CPU to bind to is specified.
# psrinfo
0 on-line since 06/11/2010 12:18:49
1 on-line since 06/11/2010 12:18:51
# echo $$
16587
# pbind -b 1 $$
process id 16587: was not bound, now 1
# sh
# echo $$
18219
# pbind -q
process id 18220: 1
process id 16857: 1
process id 18219: 1
All the threads of the specified process are bound. Also, processor bindings are inherited by any new threads or processes, so any child processes are likewise bound to the same CPU.
To remove the bindings for a process the -u option to pbind can be used, the -U option removes all bindings.
# pbind -u 18219
process id 18219: was 1, now not bound
# pbind -U
# pbind -q
process id 18219: was 1, now not bound
# pbind -U
# pbind -q
Binding a process or a thread to a CPU does not prohibit that CPU from being used for other threads.
It can be used to limit the maximum amount of CPU that a process, or group of process, can use to a single CPU.
Labels:
pbind,
Performance,
Tuning
Thursday, August 26, 2010
controlling cpu usage part 4: The Fair Share Scheduler
The Fair Share Scheduler (FSS) is an alternative scheduling class. It is not used by default an is explicitly enabled. The FSS guarantees a minimum proportion of the machines CPU resources are made available to each holder of shares, in proportion of the number of shares held.
The absolute quantity of shares is not important. Any number that is in proportion with the desired CPU entitlement can be used.
To configure projects the /etc/project file needs to be modified to identify the number of shares to be granted to each project, and the /etc/user_attr file needs to be modified to assign each user to a project.
To define two users, u1 and u2 with u1 having twice the CPU resources as u2 the entries in /etc/user_attr and /etc/project would be similar to the following:
# egrep 'u[12]' /etc/passwd
u1:x:1000:1::/export/home/u1:/bin/sh
u2:x:1001:1::/export/home/u2:/bin/sh
# egrep 'u[12]' /etc/user_attr
u1::::type=normal;project=u1
u2::::type=normal;project=u2
# egrep 'u[12]' /etc/project
u1:1000:User 1:u1::project.cpu-shares=(privileged,20,none)
u2:1001:User 2:u2::project.cpu-shares=(privileged,10,none)
To determine the project of the current process the ps command may be used, and the prctl command will show the number of shares.
# ps -o project= -p $$
user.root
# su - u1
$ ps -o project= -p $$
u1
$ prctl -t privileged -n project.cpu-shares -i pid $$
process: 1444: -sh
NAME PRIVILEGED VALUE FLAG ACTION RECIPIENT
project.cpu-shares
privileged 20 None -
To change the scheduling class of a running process you can use the priocntl command.
# priocntl -s -c FSS -i pid # Change one process
# priocntl -s -c FSS -i class TS # Change everything currently in TS
# priocntl -s -c FSS -i zoneid 1 # Change all processes in zone ID 1
# priocntl -s -c FSS -i pid 1 # Change init (special case)
To examine the shares granted to a process (or zone) use the prctl command.
# prctl -t privileged -n zone.cpu-shares -i zoneid 1 # Shares for zone ID 1
To modify the number of shares granted to a zone we can use -r option to prctl. This change only lasts until next reboot.
# prctl -r -v 10 -t privileged -n zone.cpu-shares -i zoneid 1
# Change number of shares to 10
# Change number of shares to 10
To change the default scheduling class, so that on next and subsequent reboots all process will use FSS by default we can use the dispadmin command.
# dispadmin -d FSS
Labels:
dispadmin,
Performance,
Projects,
Tuning,
Zones
Wednesday, August 25, 2010
controlling cpu usage part 3: Manipulating the dispatch parameter tables
Each scheduling class maintains a set of tables in the kernel. These are used to control aspects of the scheduling class. These tables may be manipulated by the dispadmin command:
# dispadmin -l
CONFIGURED CLASSES
==================
SYS (System Class)
TS (Time Sharing)
FX (Fixed Priority)
RT (Real Time)
IA (Interactive)
Changing the Scheduler
# dispadmin -g -c TS
# Time Sharing Dispatcher Configuration
RES=1000
# ts_quantum ts_tqexp ts_slpret ts_maxwait ts_lwait PRIORITY LEVEL
200 0 50 0 50 # 0
200 0 50 0 50 # 1
200 0 50 0 50 # 2
200 0 50 0 50 # 3
200 0 50 0 50 # 4
200 0 50 0 50 # 5
200 0 50 0 50 # 6
200 0 50 0 50 # 7
200 0 50 0 50 # 8
200 0 50 0 50 # 9
...
160 0 51 0 51 # 10
160 1 51 0 51 # 11
160 2 51 0 51 # 12
160 3 51 0 51 # 13
160 4 51 0 51 # 14
...
40 40 58 0 59 # 50
40 41 58 0 59 # 51
40 46 58 0 59 # 56
40 47 58 0 59 # 57
40 48 58 0 59 # 58
20 49 59 32000 59 # 59
The new table will come effect immediately no reboot is required here. But the change will only have effect during the current life-time of the current boot time. To make the change effective on subsequent boots the dispadmin -c TS -s new_table has to be run as an initialization script on each boot. It is recommended the this is placed after the single-user milestone is reached to enable the system to be booted to single user mode in the case the table turns out to be incorrect.
# dispadmin -l
CONFIGURED CLASSES
==================
SYS (System Class)
TS (Time Sharing)
FX (Fixed Priority)
RT (Real Time)
IA (Interactive)
Changing the Scheduler
Solaris comes with six defined scheduling classes. Of these classes four are provided for use by user threads time sharing (TS), interactive (IA), fixed priority (FX) & fair share scheduling (FSS) . the other two are system, for kernel threads, and real-time.
If there are multiple processor sets in use then each processor set can theoretically use a different scheduling class. This is only practical when using the pool subsystem, which allows scheduling class to be specified per pool.
Time Sharing/Interactive Scheduling Classes
Time sharing and interactive classes use the same algorithm, the difference between them is that interactive scheduling class attempts to provide a slight boost to the foreground process
The two classes provide a table which has entries for:
- quantum - number of time periods allowed
- tqexp - priority to change thread to when quantum expired
- slpret - priority to change thread to when returning from a sleep
- maxwait - maximum number of seconds to wait for CPU before changing priority
- lwait - priority to change thread to when maxwait expired
# dispadmin -g -c TS
# Time Sharing Dispatcher Configuration
RES=1000
# ts_quantum ts_tqexp ts_slpret ts_maxwait ts_lwait PRIORITY LEVEL
200 0 50 0 50 # 0
200 0 50 0 50 # 1
200 0 50 0 50 # 2
200 0 50 0 50 # 3
200 0 50 0 50 # 4
200 0 50 0 50 # 5
200 0 50 0 50 # 6
200 0 50 0 50 # 7
200 0 50 0 50 # 8
200 0 50 0 50 # 9
...
160 0 51 0 51 # 10
160 1 51 0 51 # 11
160 2 51 0 51 # 12
160 3 51 0 51 # 13
160 4 51 0 51 # 14
...
40 40 58 0 59 # 50
40 41 58 0 59 # 51
40 46 58 0 59 # 56
40 47 58 0 59 # 57
40 48 58 0 59 # 58
20 49 59 32000 59 # 59
To change the dispatch parameter table for the TS and IA classes create a new table in a file and insert this file into the running kernel:
# dispadmin -c TS -g > new_table
# ( edit new_table )
# dispadmin -c TS -s new_table
# dispadmin -c TS -g > new_table
# ( edit new_table )
# dispadmin -c TS -s new_table
The new table will come effect immediately no reboot is required here. But the change will only have effect during the current life-time of the current boot time. To make the change effective on subsequent boots the dispadmin -c TS -s new_table has to be run as an initialization script on each boot. It is recommended the this is placed after the single-user milestone is reached to enable the system to be booted to single user mode in the case the table turns out to be incorrect.
Labels:
dispadmin,
Performance,
Tuning
Monday, August 23, 2010
controlling cpu usage part 2: CPU Usage Limit in the Shell
The shell ulimit command can be used to check or set the CPU limit for any subsequently created children, and their descendants. The -t option to ulimit sets the amount of CPU time a process may use before it is sent a SIGXCPU signal by the kernel. The default is unlimited (maximum CPU time).
# su - useruser $ ulimit -t
unlimited
user $ sh
user $ ulimit -t
unlimited
user $ ulimit -t 10
user $ date; while : ; do : ; done; date
Friday, 5 September 2008 3:34:56 PM EST
Cpu Limit Exceeded (core dumped)
user $
# su - useruser $ ulimit -t
unlimited
user $ sh
user $ ulimit -t
unlimited
user $ ulimit -t 10
user $ date; while : ; do : ; done; date
Friday, 5 September 2008 3:34:56 PM EST
Cpu Limit Exceeded (core dumped)
user $
Labels:
Performance,
Tuning
Friday, August 20, 2010
controlling cpu usage part 1: Introduction
CPU usage can be controlled in a number of different ways. The possible choices as of Solaris 05/08 are:
- We can set a CPU usage limit in the shell
- We can manipulate the dispatch parameter kernel tables
- We can use different schedulers, such as the FSS (Fair Share Scheduler)
- We can bind a process to a CPU
- We can use processor sets
- We can create pools, which combine scheduler changes and processor sets
- We can set a capped-cpu resource control for solaris container or zones
In the coming weeks I will discuss these options in a little detail in hope you can improve performance or tune aspects of your environment better.
Labels:
Performance,
Tuning
Monday, August 16, 2010
solaris: Description of all services
A quick tip for all Solaris 10/OpenSolaris users… some companies have a strict requirement to know exactly what each and every startup script does on their system. With releases of Solaris 9 and earlier, one would check the rc scripts. This is time consuming and may not give an accurate description or one liner. Solaris 10/OpenSolaris makes things much easier…
svcs -o FMRI,DESC
. This will produce output similar to the following:#svcs -o FMRI,DESC
FMRI DESC
lrc:/etc/rc2_d/S00set-tmp-permissions -
lrc:/etc/rc2_d/S07set-tmp-permissions -
lrc:/etc/rc2_d/S10lu -
lrc:/etc/rc2_d/S20sysetup -
lrc:/etc/rc2_d/S40llc2 -
lrc:/etc/rc2_d/S42ncakmod -
lrc:/etc/rc2_d/S70nddconfig -
lrc:/etc/rc2_d/S72autoinstall -
lrc:/etc/rc2_d/S73cachefs_daemon -
lrc:/etc/rc2_d/S81dodatadm_udaplt -
lrc:/etc/rc2_d/S89bdconfig -
lrc:/etc/rc2_d/S91afbinit -
lrc:/etc/rc2_d/S91gfbinit -
lrc:/etc/rc2_d/S91ifbinit -
lrc:/etc/rc2_d/S91jfbinit -
lrc:/etc/rc2_d/S91kfbinit -
lrc:/etc/rc2_d/S91zuluinit -
lrc:/etc/rc2_d/S94ncalogd -
lrc:/etc/rc2_d/S95lwact -
lrc:/etc/rc2_d/S95nbclient -
lrc:/etc/rc2_d/S98deallocate -
lrc:/etc/rc2_d/S99sneep -
lrc:/etc/rc3_d/S16boot_server -
lrc:/etc/rc3_d/S50apache -
lrc:/etc/rc3_d/S52imq -
lrc:/etc/rc3_d/S84appserv -
svc:/system/fpsd:default FP Scrubber - Online Floating Point Unit Test
svc:/system/svc/restarter:default master restarter
svc:/network/pfil:default packet filter
svc:/network/tnctl:default trusted networking templates
svc:/network/loopback:default loopback network interface
svc:/system/installupdates:default system update installer
svc:/system/filesystem/root:default root file system mount
svc:/system/scheduler:default default scheduling class configuration
svc:/system/boot-archive:default check boot archive content
svc:/network/physical:default physical network interfaces
svc:/system/identity:node system identity (nodename)
svc:/system/filesystem/usr:default read/write root file systems mounts
svc:/system/keymap:default keyboard defaults
svc:/network/ipfilter:default IP Filter
svc:/system/device/local:default Standard Solaris device configuration.
svc:/system/filesystem/minimal:default minimal file system mounts
svc:/system/rmtmpfiles:default remove temporary files
svc:/system/resource-mgmt:default Global zone resource management settings
svc:/system/coreadm:default system-wide core file configuration
svc:/system/name-service-cache:default name service cache
svc:/system/identity:domain system identity (domainname)
svc:/system/cryptosvc:default cryptographic services
svc:/system/sysevent:default system event notification
svc:/system/device/fc-fabric:default Solaris FC fabric device configuration.
svc:/network/ipsec/ipsecalgs:default IPsec algorithm initialization
svc:/milestone/devices:default device configuration milestone
svc:/system/picl:default platform information and control
svc:/network/ipsec/policy:default IPsec policy initialization
svc:/milestone/network:default Network milestone
svc:/system/pkgserv:default Flush package command database to disk
svc:/application/print/ppd-cache-update:default ppd cache update
svc:/network/initial:default initial network services
svc:/system/manifest-import:default service manifest import
svc:/network/service:default layered network services
svc:/system/patchchk:default Launcher for Automatic Patching services
svc:/network/dns/client:default DNS resolver
svc:/milestone/name-services:default name services milestone
svc:/network/iscsi/initiator:default -
svc:/milestone/single-user:default single-user milestone
svc:/platform/sun4v/efdaemon:default embedded FCode interpreter
svc:/system/filesystem/local:default local file system mounts
svc:/network/shares/group:default Share Group
svc:/system/cron:default clock daemon (cron)
svc:/network/shares/group:zfs Share Group
svc:/system/sysidtool:net sysidtool
svc:/system/boot-archive-update:default update boot archive if necessary
svc:/network/routing-setup:default Initial routing-related configuration.
svc:/network/ntp:default Network Time Protocol (NTP)
svc:/network/rpc/bind:default RPC bindings
svc:/application/psncollector:default Product Serial Number Collector
svc:/system/sysidtool:system sysidtool
svc:/milestone/sysconfig:default Basic system configuration milestone
svc:/system/sac:default SAF service access controller
svc:/system/postrun:default Postponed package postinstall command execution
svc:/network/inetd:default inetd
svc:/system/utmp:default utmpx monitoring
svc:/system/console-login:default Console login
svc:/system/dumpadm:default system crash dump configuration
svc:/network/ssh:default SSH server
svc:/system/system-log:default system log
svc:/application/management/seaport:default net-snmp SNMP daemon
svc:/network/smtp:sendmail sendmail SMTP mail transfer agent
svc:/network/sendmail-client:default sendmail SMTP client queue runner
svc:/system/fmd:default Solaris Fault Manager
svc:/network/rpc/rstat:default kernel statistics server
svc:/network/rpc/smserver:default removable media management
svc:/network/cde-spc:default CDE subprocess control
svc:/network/bpcd/tcp:default bpcd
svc:/network/vnetd/tcp:default vnetd
svc:/network/vopied/tcp:default vopied
svc:/network/bpjava-msvc/tcp:default bpjava-msvc
svc:/system/filesystem/volfs:default Volume Management filesystem
svc:/application/management/sma:default net-snmp SNMP daemon
svc:/milestone/multi-user:default multi-user milestone
svc:/milestone/multi-user-server:default multi-user plus exports milestone
svc:/application/stosreg:default Service Tag OS Registry Inserter
svc:/system/zones:default Zones autoboot and graceful shutdown
svc:/system/basicreg:default
svc:/application/sthwreg:default Hardware Service Tag Collector
svc:/application/print/ipp-listener:default Internet Print Protocol Listening Service
svc:/application/print/rfc1179:default BSD print protocol adapter
Wednesday, August 11, 2010
veritas: Cannot unmount a Locked VxFS filesystem
Storage Foundation 5.0MP3 introduced a new feature called VxFS filesystem lock which disallows accidental unmounts when the file system resource is online. New umount option mntunlock is used to clear the lock and then unmount the filesystem. The offline script for the Mount resource will use this new option.
How to check if the filesystem is locked by VCS:
# mount -v | grep mntlock
Sometimes, it may be necessary to unmount a mount locked filesystem. This is for cases where VCS service groups having DiskGroup resources configured with UnMountVolumes attribute set and the volumes are mounted outside of VCS control (this is not very common).
The filesystem locking system is to prevent accidental unmounts, if the attribute is set to 0 VCS will not lock the filesystem.
The following fix from Symantec to umount the locked vxfs filesystem may not work, a bug in 5.0 MP3 was found and it could be fixed in future releases.
If the following command does not work a bounce is required.
Solaris:
# /opt/VRTS/bin/umount -o mntunlock=VCS /mount-point
If you continue to experience issues such as :
The above "umount" command has already cleared the Mount Lock silently but the Mount Lock is still shown in the "mount -v" output. Now the file system will not be able to be unmounted.
Run fuser checks on the mount points to confirm any outbound processes still running:
but if you continue to encounter issues and recieve the following error:
You can attempt to unmount it using the fsadm command:
Trying to clear the Mount Lock using fsadm could also fail.
The workaround is to lock the file system again using fsadm with the same lock name.
Now the system can be unmounted successfully by umount.
Please note that if the VxFS file system is disabled, fsadm will not be able to remove the lock. The only way to unmount the disabled file system is to reboot the system.
UX:vxfs umount: ERROR: V-3-21705: mount-point cannot unmount : Device busy
The above "umount" command has already cleared the Mount Lock silently but the Mount Lock is still shown in the "mount -v" output. Now the file system will not be able to be unmounted.
Run fuser checks on the mount points to confirm any outbound processes still running:
# fuser -c /mount-point
but if you continue to encounter issues and recieve the following error:
UX:vxfs umount: ERROR: V-3-26365: Incorrect mntlock id (Invalid argument)
You can attempt to unmount it using the fsadm command:
# fsadm -o mntunlock=VCS /mount-point
Trying to clear the Mount Lock using fsadm could also fail.
UX:vxfs fsadm:
ERROR: V-3-26348: file system not mount locked
The workaround is to lock the file system again using fsadm with the same lock name.
#
fsadm -o mntlock=VCS /mount-point
Now the system can be unmounted successfully by umount.
# umount -o mntunlock=VCS /mount-point
Please note that if the VxFS file system is disabled, fsadm will not be able to remove the lock. The only way to unmount the disabled file system is to reboot the system.
Apr 17 13:00:07 alaw2 vxfs: [ID 702911 kern.warning] WARNING: msgcnt 3 mesg 031: V-2-31: vx_disable - /mount-point file system disabled
# fsadm -o mntlock=VCS /mount-point
UX:vxfs fsadm: ERROR: V-3-20275: cannot open /mount-point
UX:vxfs fsadm: ERROR: V-3-20275: cannot open /mount-point
Tuesday, August 10, 2010
veritas: Mult-Link-based IPMP setup with VCS
With Solaris 10 came a nice feature – Multi-Link-based IP Multipathing (IPMP). It determines NIC availability solely on the NIC driver reporting the physical link status – UP or DOWN. Previous versions used “probe-based” IPMP, where connectivity is tested by pinging something on the network from each interface. While probe-based is actually a more thorough test (tests network layer 3 as well as 2), it is much more cumbersome to configure, and you need an extra IP address for each interface for “test” addresses. IMO Multi-Link-based IPMP is sufficient for most applications.
To achieve multi-link-based IPMP, here’s how I’ve configured my MultiNICB resource in this large 10 node clustered environment:
Multi-Link-based IPMP MultiNICB properties
These are the values you must change from the defaults:
UseMpathd: 1
Tells VCS to use mpathd for network link status
MpathCommand: /usr/sbin/in.mpathd
Be sure to create a symbolic link to /usr/lib/inet/in.mpathd -a if the above does not exist.
ConfigCheck: 1
If you leave this at 1, it will overwrite your /etc/hostname.xxx files with probe-based IPMP configuration, if left at 0 it will not change.
Device: (your IPMP interfaces here)
List of interfaces and there interface aliases.
Tick on per System and add the device and interface alias entry for each IPMP grouped interface from each host in the cluster.
GroupName:
Do not use your IPMP group name here, it’s not needed. VCS is not monitoring the group, mpathd is.
To achieve multi-link-based IPMP, here’s how I’ve configured my MultiNICB resource in this large 10 node clustered environment:
Multi-Link-based IPMP MultiNICB properties
These are the values you must change from the defaults:
UseMpathd: 1
Tells VCS to use mpathd for network link status
MpathCommand: /usr/sbin/in.mpathd
Be sure to create a symbolic link to /usr/lib/inet/in.mpathd -a if the above does not exist.
ConfigCheck: 1
If you leave this at 1, it will overwrite your /etc/hostname.xxx files with probe-based IPMP configuration, if left at 0 it will not change.
Device: (your IPMP interfaces here)
List of interfaces and there interface aliases.
Tick on per System and add the device and interface alias entry for each IPMP grouped interface from each host in the cluster.
GroupName:
Do not use your IPMP group name here, it’s not needed. VCS is not monitoring the group, mpathd is.
veritas: Veritas Volume Manager (VxVM) Commands
Here are some links to Basic and Advanced (VxVM) commands for your online storage management enterprise or Storage Area Network (SAN) environments.
Basic VxVm Commands:
Advance VxVM Commands:
Basic VxVm Commands:
Advance VxVM Commands:
Friday, August 6, 2010
commands: eXtended System Control Facility (XSCF)
The Sun SPARC Enterprise M4000/M5000/M8000/M9000 Servers XSCF useful console commands:
XSCF> console -d 0
XSCF> console -f -d 0
XSCF> showstatus
XSCF> showversion -c xcp -v [shows xcp firmware, version, openboot prom version
XSCF> showenvironment
XSCF> showenvironment temp
XSCF> showenvironment volt
XSCF> showhardconf
XSCF> showdcl -va [check domain id...]
XSCF> showdomainstatus -a
XSCF> showboards -a
XSCF> poweron -a [powers up all domains]
XSCF> poweroff -a [powers off all domains]
XSCF> poweron -d 0 [powers on domain 0]
XSCF> poweroff -d 0 [powers off domain 0]
XSCF> poweroff -f -d 0 [forces a power off domain 0]
XSCF> reset -d 0 por [resets domain 0]
XSCF> reset -d 0 xir [resets domain 0 with XIR reset]
XSCF> sendbreak -d 0 [sends break command to domain 0]
XSCF> setautologout -s 60 [sets autologout to 60 minutes]
XSCF> showautologout
XSCF> shownetwork -a
XSCF> setnetwork xscf#0-lan#0 -m 255.255.255.0 10.10.10.5
XSCF> sethostname xscf#0 fire-xscf
XSCF> sethostname -h host.org
XSCF> setroute -h host.org
XSCF> setnameserver 10.10.10.2 10.10.10.3
XSCF> setroute -c add -n 10.10.10.1 -m 255.255.255.0 xscf#0-lan#
XSCF> snapshot -L F -t [username]@[hostname]:[directory_to_save_to]
#. to break from console