Sie sind auf Seite 1von 69

2011

IBM Power Systems Technical University October 10-14 | Fontainebleau Miami Beach | Miami, FL

PE61 Part I: Updated Concepts and Tactics -How to Monitor and Analyze the VMM and Storage I/O Statistics of a Power/AIX LPAR
Earl Jew (earlj@us.ibm.com) 310-251-2907 cell Senior IT Management Consultant - IBM Power Systems and IBM Systems Storage IBM Lab Services and Training - US Power Systems (group/dept) 400 North Brand Blvd., c/o IBM 8th floor, Glendale, CA 91203 [Extended: April 4th, 2013]

Copyright IBM Corporation 2012 Materials may not be reproduced in whole or in part without the prior written permission of IBM.

5.3

Part I: Updated Concepts and Tactics -- How to Monitor and Analyze the VMM and Storage I/O Statistics of a Power/AIX LPAR

ABSTRACT
This presentation updates AIX/VMM (Virtual Memory Management) and LVM/JFS2 storage IO performance concepts and tactics for the day-to-day Power/AIX system administrator. It explains the meaning of the numbers offered by AIX commands (vmstat, iostat, mpstat, sar, etc.) to monitor and analyze the AIX VMM and storage IO performance and capacity of a given Power7/AIX LPAR. These tactics are further illustrated in Part II: Updated Real-world Case Histories -How to Monitor and Analyze the VMM and Storage I/O Statistics of a Power/AIX LPAR.

Copyright IBM Corporation 2012

Part II: Updated Real-world Case Histories -- How to Monitor and Analyze the VMM and Storage I/O Statistics of a Power/AIX LPAR

ABSTRACT
These updated case-histories further illustrate the content presented in Part I: Updated Concepts and Tactics -- How to Monitor and Analyze the VMM and Storage I/O Statistics of a Power/AIX LPAR. This presentation includes suggested ranges and ratios of AIX statistics to guide VMM and storage IO performance and capacity analysis. Each case is founded on a different real-world customer configuration and workload that manifests characteristically in the AIX performance statistics -- as performing: intensely in bursts, with hangs and releases, AIX:lrud constrained, AIX-buffer constrained, freely unconstrained, inode-lock contended, consistently light, atomic&synchronous, virtually nil IO workload, long avg-wait's, perfectly ideal, long avg-serv's, mostly rawIO, etc.

Copyright IBM Corporation 2012

Strategic Perspective: What is Workload Characterization?

Power/AIX performance-tuning is based on continuous cycles of:


workload characterization, i.e. monitoring for indicated issues implementing tactics to remedy indicated issues

Workload characterization is determining an infrastructures resource capacities under load In other words, workload characterization examines:
the readiness of instructions&data residing in SAN storage, main memory, Power7 L3/L2/L1 cache the latency&throughput of instruction&data transfers between the above, i.e. multipathing, blocked IOs the processing of instructions&data, i.e. CPUs simultaneously executing prioritized processes/threads the dynamic balance and relative exhaustion/surplus of above resources

Workload characterization accounts an LPARs technology, implementation, size/count/bandwidth


IBM Power CPU technology, i.e. Power5/5+, Power6/6+, Power7/7+ Booted implementation, i.e. shared-pool vs dedicated CPU LPARs, SRAD affinity assignment Component implementation, i.e. dedicated IO adapters (traditional) vs. dual-VIOS (PowerVM), NPIV Size/count/bandwidth of component technologies to address the expected workload, i.e.:
Total LPAR gbRAM and the relative amounts of the four main sections AIX VMM memory count of vCPU/eCPU/logicalCPU/FC-HBAs/LAN adapters/PCIe Gen2 slots/etc and the bandwidth of each

Copyright IBM Corporation 2012

Formulations of AIX Tactics for Empirical Performance Analysis

This presentation will: explain the numbers presented by mundane AIX commands (vmstat,mpstat,iostat,ps,) formulate the recognition and severity of indicated AIX performance issues hidden in these numbers offer tactics to remedy any indicated AIX performance issues

Formulated indicators in mundane AIX command output can distinguish areas of resource exhaustion, limitation and over-commitment, as well as, resource under-utilization, surplus and over-allocation Monitoring AIX: hardware->implementation->historical/accumulated stats->real-time/dynamic stats Review component technology of the infrastructure, i.e. ensure proper tuning-by-hardware Review implemented AIX structures, i.e. shared vs dedicated CPUs, SRADs, VIOS, NPIV, LVM/JFS2 constructs Review historical/accumulated AIX events, usages, pendings, counts, blocks, exhaustion, etc. Monitor real-time/dynamic AIX command behaviors, i.e. ps,vmstat,mpstat,iostat,ipcs, etc. Interpret all indicators relative to the in-place technology, implementation and count/size/bandwidth of resources Historical/cumulative indicators are judged by counts-per-scale over days-uptime since boot Real-time/dynamic indicators are compared by ranges&ratios of system resources Color-coded Severity-of-Indicators: blue/surplus, green/normal, orange/warning, red/critical

Copyright IBM Corporation 2012

Considerations when Monitoring AIX Performance statistics

Monitor dynamic AIX behaviors with 1 or 2 second sampling intervals (vs 10-600 secs.) Verify a stressful workload exists:
We cant tune what is not being taxed

Discontinue active efforts when done:


If/when it runs fast enough, were tuned

Favor building track-able discrete structures:


We cant tune what cant be tracked

Discern workload spikes,peaks,bursts and burns:


We tune the intensities, not the sleepy-times

Establish dynamic baselines by monitoring real-time AIX behaviors with ranges&ratios Monitor AIX behaviors with the goal of characterizing the workload (vmstat I 1)

Copyright IBM Corporation 2012

Strategic Thoughts, Concepts, Considerations, and Tactics

Monitoring AIX Usage, Meaning and Interpretation Review component technology of the infrastructure, i.e. proper tuning-by-hardware Review implemented AIX constructs, i.e. firm near-static structures and settings Review historical/accumulated AIX events, i.e. usages, pendings, counts, blocks, etc. Monitor dynamic AIX command behaviors, i.e. ps, vmstat, mpstat, iostat, etc. Recognizing Common Performance-degrading Scenarios High Load Average relative to count-of-LCPUs, i.e. over-threadedness vmstat:memory:avm near-to or greater-than lruable-gbRAM, i.e. over-committed Continuous low vmstat:memory:fre with persistent lrud (fr:sr) activity Continuous high ratio of vmstat:kthr:b relative to vmstat:kthr:r Poor ratio of pages freed to pages examined (fr:sr ratio) in vmstat -s output

Copyright IBM Corporation 2012

Note the size, scale, technology and implementation of the given LPAR Note the LPARs ratio-of-resources, i.e. CPU-to-RAM-to-SAN I/O

$ date ; uname -a ; id ; oslevel s; lparstat -i Wed Sep 26 00:00:00 EDT 2012 AIX tsm03 1 6 00X555XX5X00 uid=0(root) gid=0(system) groups=2(bin),3(sys),7(security),8(cron),10(audit),11(lp) 6100-06-06-1140 Node Name : tsm03 Partition Name : TSM03 Partition Number : 1 Type : Shared-SMT-4 Mode : Uncapped Entitled Capacity : 6.00 Partition Group-ID : 32769 Shared Pool ID : 0 Online Virtual CPUs : 6 Maximum Virtual CPUs : 7 Minimum Virtual CPUs : 4 Online Memory : 24064 MB Maximum Memory : 24064 MB Minimum Memory : 24064 MB Variable Capacity Weight : 128 Minimum Capacity : 4.00 Maximum Capacity : 7.00 Capacity Increment : 0.01 Maximum Physical CPUs in system : 16 Active Physical CPUs in system : 16 Active CPUs in Pool : 16 Shared Physical CPUs in system : 16 Maximum Capacity of Pool : 1600 Entitled Capacity of Pool : 1600 Unallocated Capacity : 0.00 Physical CPU Percentage : 100.00% Unallocated Weight : 0 Memory Mode : Dedicated Total I/O Memory Entitlement : Variable Memory Capacity Weight : Memory Pool ID : Physical Memory in the Pool : Hypervisor Page Size : Unallocated Variable Memory Capacity Weight: Unallocated I/O Memory entitlement : Memory Group ID of LPAR : Desired Virtual CPUs : 6 Desired Memory : 24064 MB Desired Variable Capacity Weight : 128 Copyright IBM Corporation 2012 Desired Capacity : 6.00

prtconf # note the component technology of the given LPAR

$ prtconf System Model: IBM,8233-E8B Machine Serial Number: 5555XXX Processor Type: PowerPC_POWER7 Processor Implementation Mode: POWER 7 Processor Version: PV_7_Compat Number Of Processors: 6 Processor Clock Speed: 3300 MHz CPU Type: 64-bit Kernel Type: 64-bit LPAR Info: 1 TSM03 Memory Size: 24064 MB Good Memory Size: 24064 MB Platform Firmware level: AL710_065 Firmware Version: IBM,AL710_065 Console Login: enable Auto Restart: true Full Core: false Network Information Host Name: tsm03 IP Address: 111.222.33.44 Sub Netmask: 255.255.255.128 Gateway: 111.222.33.1 Name Server: 111.222.166.17 Domain Name: customer.com Paging Space Information Total Paging Space: 60672MB Percent Used: 24% Volume Groups Information ============================================================================== Inactive VGs ============================================================================== heartbeat_vg ============================================================================== Active VGs ============================================================================== tsm_vg: PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION hdiskpower57 active 99 0 00..00..00..00..00 hdiskpower8 active 9 0 00..00..00..00..00

Copyright IBM Corporation 2012

lscfg # note the placement of components in the implementation of the LPAR

$ lscfg INSTALLED RESOURCE LIST The following resources are installed on the machine. +/- = Added or deleted from Resource List. * = Diagnostic support not available. Model Architecture: chrp Model Implementation: Multiple Processor, PCI bus + + * * * * * * * * * * * + + * + * + * * * + + * * * * * * sys0 sysplanar0 vio0 vscsi2 vscsi1 vscsi0 hdisk3 hdisk2 hdisk1 hdisk0 vsa0 vty0 pci5 ent0 ent1 pci4 fcs6 fcnet6 fscsi6 sfwcomm6 rmt156 rmt157 fcs7 fscsi7 rmt74 rmt75 rmt76 rmt77 rmt78 rmt79 System Object System Planar Virtual I/O Bus Virtual SCSI Client Adapter Virtual SCSI Client Adapter Virtual SCSI Client Adapter Virtual SCSI Disk Drive Virtual SCSI Disk Drive Virtual SCSI Disk Drive Virtual SCSI Disk Drive LPAR Virtual Serial Adapter Asynchronous Terminal PCI Express Bus 2-Port 10/100/1000 Base-TX PCI-Express Ada 2-Port 10/100/1000 Base-TX PCI-Express Ada PCI Express Bus 8Gb PCI Express Dual Port FC Adapter (df10 Fibre Channel Network Protocol Device FC SCSI I/O Controller Protocol Device Fibre Channel Storage Framework Comm LTO Ultrium Tape Drive (FCP) LTO Ultrium Tape Drive (FCP) 8Gb PCI Express Dual Port FC Adapter (df10 FC SCSI I/O Controller Protocol Device LTO Ultrium Tape Drive (FCP) LTO Ultrium Tape Drive (FCP) LTO Ultrium Tape Drive (FCP) LTO Ultrium Tape Drive (FCP) LTO Ultrium Tape Drive (FCP) LTO Ultrium Tape Drive (FCP)

U8233.E8B.1009ADP-V1-C5-T1 U8233.E8B.1009ADP-V1-C3-T1 U8233.E8B.1009ADP-V1-C2-T1 U8233.E8B.1009ADP-V1-C2-T1-L8400000000000000 U8233.E8B.1009ADP-V1-C2-T1-L8300000000000000 U8233.E8B.1009ADP-V1-C2-T1-L8200000000000000 U8233.E8B.1009ADP-V1-C2-T1-L8100000000000000 U8233.E8B.1009ADP-V1-C0 U8233.E8B.1009ADP-V1-C0-L0 U5802.001.00H2615-P1 U5802.001.00H2615-P1-C6-T1 U5802.001.00H2615-P1-C6-T2 U5802.001.00H2615-P1 U5802.001.00H2615-P1-C5-T1 U5802.001.00H2615-P1-C5-T1 U5802.001.00H2615-P1-C5-T1 U5802.001.00H2615-P1-C5-T1-W0-L0 U5802.001.00H2615-P1-C5-T1-W500308C0022DD803-L0 U5802.001.00H2615-P1-C5-T1-W500308C0022DD803-L1000000000000 U5802.001.00H2615-P1-C5-T2 U5802.001.00H2615-P1-C5-T2 U5802.001.00H2615-P1-C5-T2-W21000024FF31B5B1-L9000000000000 U5802.001.00H2615-P1-C5-T2-W21000024FF31B5B1-LA000000000000 U5802.001.00H2615-P1-C5-T2-W21000024FF31B5B1-LB000000000000 U5802.001.00H2615-P1-C5-T2-W21000024FF31B5B1-LC000000000000 U5802.001.00H2615-P1-C5-T2-W21000024FF31B5B1-LD000000000000 U5802.001.00H2615-P1-C5-T2-W21000024FF31B5B1-LE000000000000

Copyright IBM Corporation 2012

10

lsdev note the count&capacity of the component technology of the LPAR

$ lsdev L2cache0 cd0 en0 en1 en2 en3 en4 en5 ent0 ent1 ent2 ent3 et0 et1 et2 et3 et4 et5 fcnet0 fcnet1 fcnet2 fcnet3 fcnet4 fcnet5 fcnet6 fcnet7 fcs0 fcs1 fcs2 fcs3 fcs4 fcs5 fcs6 fcs7 fscsi0 fscsi1 fscsi2 fscsi3 fscsi4 fscsi5 fscsi6 fscsi7 hba0

Available Defined Defined Defined Defined Available Defined Defined Available Available Available Available Defined Defined Defined Defined Defined Defined Defined Defined Defined Defined Defined Defined Defined Defined Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available

05-00 05-01 00-00-00 00-01 07-00-00 05-00 05-01 00-00-00 05-00 05-01 00-00-00 00-01 07-00-00 01-00-02 01-01-01 03-00-01 03-01-02 04-00-02 04-01-01 02-00-01 02-01-02 01-00 01-01 03-00 03-01 04-00 04-01 02-00 02-01 01-00-01 01-01-02 03-00-02 03-01-01 04-00-01 04-01-02 02-00-02 02-01-01 00-00

L2 Cache Virtual SCSI Optical Served by VIO Server Standard Ethernet Network Interface Standard Ethernet Network Interface Standard Ethernet Network Interface Standard Ethernet Network Interface Standard Ethernet Network Interface Standard Ethernet Network Interface 2-Port 10/100/1000 Base-TX PCI-Express Adapter (14104003) 2-Port 10/100/1000 Base-TX PCI-Express Adapter (14104003) 10 Gigabit Ethernet Adapter (ct3) EtherChannel / IEEE 802.3ad Link Aggregation IEEE 802.3 Ethernet Network Interface IEEE 802.3 Ethernet Network Interface IEEE 802.3 Ethernet Network Interface IEEE 802.3 Ethernet Network Interface IEEE 802.3 Ethernet Network Interface IEEE 802.3 Ethernet Network Interface Fibre Channel Network Protocol Device Fibre Channel Network Protocol Device Fibre Channel Network Protocol Device Fibre Channel Network Protocol Device Fibre Channel Network Protocol Device Fibre Channel Network Protocol Device Fibre Channel Network Protocol Device Fibre Channel Network Protocol Device 8Gb PCI Express Dual Port FC Adapter (df1000f114108a03) 8Gb PCI Express Dual Port FC Adapter (df1000f114108a03) 8Gb PCI Express Dual Port FC Adapter (df1000f114108a03) 8Gb PCI Express Dual Port FC Adapter (df1000f114108a03) 8Gb PCI Express Dual Port FC Adapter (df1000f114108a03) 8Gb PCI Express Dual Port FC Adapter (df1000f114108a03) 8Gb PCI Express Dual Port FC Adapter (df1000f114108a03) 8Gb PCI Express Dual Port FC Adapter (df1000f114108a03) FC SCSI I/O Controller Protocol Device FC SCSI I/O Controller Protocol Device FC SCSI I/O Controller Protocol Device FC SCSI I/O Controller Protocol Device FC SCSI I/O Controller Protocol Device FC SCSI I/O Controller Protocol Device FC SCSI I/O Controller Protocol Device FC SCSI I/O Controller Protocol Device 10 Gigabit Ethernet-SR PCI-Express Host Bus Adapter (2514300014108c03)

Copyright IBM Corporation 2012

11

Strategic Thoughts, Concepts, Considerations, and Tactics

Monitoring AIX Usage, Meaning and Interpretation Review component technology of the infrastructure, i.e. proper tuning-by-hardware Review implemented AIX constructs, i.e. firm near-static structures and settings Review historical/accumulated AIX events, i.e. usages, pendings, counts, blocks, etc. Monitor dynamic AIX command behaviors, i.e. ps, vmstat, mpstat, iostat, etc. Recognizing Common Performance-degrading Scenarios High Load Average relative to count-of-LCPUs, i.e. over-threadedness vmstat:memory:avm near-to or greater-than lruable-gbRAM, i.e. over-committed Continuous low vmstat:memory:fre with persistent lrud (fr:sr) activity Continuous high ratio of vmstat:kthr:b relative to vmstat:kthr:r Poor ratio of pages freed to pages examined (fr:sr ratio) in vmstat -s output

Copyright IBM Corporation 2012

12

lsps ; mount ; df -k review the implemented construction of firm AIX structures

$ lsps a ; lsps s ; mount ; df -k Page Space Physical Volume Volume Group Size %Used Active Auto Type Chksum paging02 hdisk2 paging_vg 9216MB 39 yes yes lv 0 paging01 hdisk2 paging_vg 24576MB 15 yes yes lv 0 paging00 hdisk2 paging_vg 16384MB 22 yes yes lv 0 hd6 hdisk0 rootvg 10496MB 35 yes yes lv 0 Total Paging Space Percent Used 60672MB 24% node mounted mounted over vfs date options -------- --------------- --------------- ------ ------------ --------------/dev/hd4 / jfs2 Sep 07 17:03 rw,log=/dev/hd8 /dev/hd2 /usr jfs2 Sep 07 17:03 rw,log=/dev/hd8 /dev/hd9var /var jfs2 Sep 07 17:03 rw,log=/dev/hd8 /dev/hd3 /tmp jfs2 Sep 07 17:03 rw,log=/dev/hd8 /dev/hd1 /home jfs2 Sep 07 17:07 rw,log=/dev/hd8 /dev/hd11admin /admin jfs2 Sep 07 17:07 rw,log=/dev/hd8 /proc /proc procfs Sep 07 17:07 rw /dev/hd10opt /opt jfs2 Sep 07 17:07 rw,log=/dev/hd8 /dev/livedump /var/adm/ras/livedump jfs2 Sep 07 17:07 rw,log=/dev/hd8 /dev/install_sw_lv /install_sw jfs2 Sep 07 17:07 rw,log=/dev/hd8 /dev/tsmlib1_lv /tsm/db2lib1 jfs2 Sep 07 17:22 rw,log=INLINE /dev/tsm_db_lv /tsm/tsm jfs2 Sep 07 17:22 rw,log=INLINE /dev/tsm_arc_lv /tsm/tsm/arch jfs2 Sep 07 17:22 rw,log=INLINE /dev/tsm_dat01_lv /tsm/tsm/data01 jfs2 Sep 07 17:22 rw,log=INLINE /dev/tsm_dat02_lv /tsm/tsm/data02 jfs2 Sep 07 17:22 rw,log=INLINE /dev/tsm_dat03_lv /tsm/tsm/data03 jfs2 Sep 07 17:22 rw,log=INLINE /dev/tsm_lg_lv /tsm/tsm/log jfs2 Sep 07 17:22 rw,log=INLINE /dev/lv01 /tsm/tsmb jfs2 Sep 07 17:22 rw,log=INLINE Filesystem 1024-blocks Free %Used Iused %Iused Mounted on /dev/hd4 3145728 2605152 18% 31050 5% / /dev/hd2 4390912 581548 87% 64251 29% /usr /dev/hd9var 2097152 78452 97% 9844 24% /var /dev/hd3 2097152 1035572 51% 2530 2% /tmp /dev/hd1 1048576 250468 77% 1198 3% /home /dev/hd11admin 131072 130692 1% 5 1% /admin /proc - /proc /dev/hd10opt 5242880 1815992 66% 26774 6% /opt /dev/livedump 262144 255344 3% 31 1% /var/adm/ras/livedump /dev/install_sw_lv 20971520 7548932 65% 7944 1% /install_sw /dev/tsmlib1_lv 51380224 21353496 59% 1818 1% /tsm/db2lib1 /dev/tsm_db_lv 513802240 209820276 60% 1695 1% /tsm/tsm /dev/tsm_arc_lv 102760448 74128676 28% 73 1% /tsm/tsm/arch /dev/tsm_dat01_lv 519045120 6434120 99% 25 1% /tsm/tsm/data01 /dev/tsm_dat02_lv 519045120 32034120 94% 23 1% /tsm/tsm/data02 Copyright IBM Corporation 2012

13

df -k review the implemented construction of firm AIX structures; observe count-of-inodes per GBs(used) of each applications data filesystems

$ df -k
Filesystem 1024-blocks Free %Used Iused %Iused Mounted on /dev/hd4 262144 129016 51% 3777 3% / /dev/hd2 3932160 544280 87% 42721 5% /usr /dev/hd9var 1048576 334980 69% 4293 2% /var /dev/hd3 1048576 731832 31% 519 1% /tmp /dev/hd1 262144 63632 76% 2622 5% /home /proc - /proc /dev/hd10opt 262144 213832 19% 849 2% /opt /dev/lvsapcds 2097152 456840 79% 1246 2% /sapcds /dev/lvcnvbt 20480000 16993664 18% 715 1% /cnv /dev/lvhrtmpbt 524288 506984 4% 30 1% /hrtmp /dev/lvoraclebt 524288 436808 17% 2938 3% /oracle /dev/lvorapr1bt 8978432 3838252 58% 21476 3% /oracle/PR1 /dev/lvmirrlogAp 3080192 2567348 17% 6 1% /oracle/PR1/mirrlogA /dev/lvmirrlogBp 3080192 2567348 17% 6 1% /oracle/PR1/mirrlogB /dev/lvoriglogAp 3080192 2567348 17% 6 1% /oracle/PR1/origlogA /dev/lvoriglogBp 3080192 2567348 17% 6 1% /oracle/PR1/origlogB /dev/lvsaparchbt 14680064 14296480 3% 7176 1% /oracle/PR1/saparch /dev/lvsapdata1bt 268173312 73734764 73% 116 1% /oracle/PR1/sapdata1 /dev/lvsapdata18bt 268173312 73751196 73% 108 1% /oracle/PR1/sapdata10 /dev/lvsapdata11bt 268173312 77027948 72% 108 1% /oracle/PR1/sapdata11 /dev/lvsapdata24bt 268173312 75455208 72% 108 1% /oracle/PR1/sapdata12 /dev/lvsapdata2bt 268173312 76225148 72% 110 1% /oracle/PR1/sapdata2 /dev/lvsapdata3bt 268173312 75569716 72% 110 1% /oracle/PR1/sapdata3 /dev/lvsapdata14bt 268173312 74930816 73% 108 1% /oracle/PR1/sapdata4 /dev/lvsapdata23bt 268173312 77814376 71% 108 1% /oracle/PR1/sapdata5 /dev/lvsapdata16bt 268173312 79387368 71% 108 1% /oracle/PR1/sapdata6 /dev/lvsapdata7bt 268173312 74013420 73% 108 1% /oracle/PR1/sapdata7 /dev/lvsapdata8bt 268173312 75192876 72% 108 1% /oracle/PR1/sapdata8 /dev/lvsapdata19bt 268173312 74668728 73% 108 1% /oracle/PR1/sapdata9 /dev/lvsapreorgbt 25165824 19272876 24% 1153 1% /oracle/PR1/sapreorg /dev/lvostage 2097152 1957092 7% 794 1% /oracle/stage /dev/lvsapmntbt 2097152 1447736 31% 357 1% /sapmnt/PR1 Copyright IBM Corporation 2012

14

ipcs -bm review the implemented construction of firm AIX structures; computational memory includes allocated (vs authorized) shmemsegs

$ ipcs -bm IPC status from /dev/mem as of Wed Sep 26 00:01:26 EDT 2012 T ID KEY MODE OWNER GROUP SEGSZ Shared Memory: m 1048576 0x78000166 --rw-rw-rwroot system 33554432 m 1048577 0x7800010b --rw-rw-rwroot system 33554432 m 1048578 0x21002002 --rw------- pconsole system 10485760 m 3 0x6700b061 --rw-r--r-root system 12 m 4 0x6800b061 --rw-r--r-root system 377016 m 5 0x7000b061 --rw------root system 3168 m 23068678 0xa7067574 --rw-rw-rw- db2prd1 db2srvrs 140871904 m 9437191 0xffffffff --rw------- db2lib1 db2srvrs 268435456 m 15728648 0xffffffff --rw------- db2prd1 db2srvrs 16106127360 m 10485770 0xffffffff --rw------- db2lib1 db2srvrs 3758096384 m 35651595 0xa7067561 --rw------- db2prd1 db2srvrs 51511296 m 14680076 0xffffffff --rw------- db2lib1 db2srvrs 131072 m 6291470 0x1b7fa074 --rw-rw-rw- db2lib1 db2srvrs 140871904 m 12582927 0xffffffff --rw------- db2lib1 db2srvrs 163905536 m 8388624 0xffffffff --rw------- db2prd1 db2srvrs 268435456 m 17 0xa7067668 --rw-rw---- db2prd1 db2srvrs 50331648 m 73400338 0x1b7fa168 --rw-rw---- db2lib1 db2srvrs 50331648 m 20971539 0xffffffff --rw------- db2prd1 db2srvrs 163905536 m 6291476 0xffffffff --rw------- db2prd1 db2srvrs 131072 m 13631509 0x1b7fa061 --rw------- db2lib1 db2srvrs 51511296 m 26214422 0xffffffff --rw------- db2prd1 db2srvrs 268435456 m 111149079 0xffffffff --rw------- db2prd1 db2srvrs 268435456 m 89128984 0xffffffff --rw------- db2lib1 db2srvrs 268435456 m 1067450393 0xffffffff --rw------- db2prd1 db2srvrs 268435456 m 115343386 0xffffffff --rw------- db2prd1 db2srvrs 268435456 m 894435355 0xffffffff --rw------- db2prd1 db2srvrs 268435456 m 311427100 0xffffffff --rw------- db2lib1 db2srvrs 268435456 m 371195933 0xffffffff --rw------- db2prd1 db2srvrs 268435456 m 547356703 0xffffffff --rw------- db2prd1 db2srvrs 131072 m 569376800 0xffffffff --rw------- db2prd1 db2srvrs 131072 m 576716833 0xffffffff --rw------- db2lib1 db2srvrs 268435456 Copyright IBM Corporation 2012

15

vmo L ; ioo -L # review the implemented construction of firm AIX structures

# vmo L ; ioo L NAME CUR DEF BOOT MIN MAX UNIT TYPE DEPENDENCIES -------------------------------------------------------------------------------ame_cpus_per_pool n/a 8 8 1 1K processors B -------------------------------------------------------------------------------ame_maxfree_mem n/a 24M 24M 320K 16G bytes D ame_minfree_mem -------------------------------------------------------------------------------ame_min_ucpool_size n/a 0 0 5 95 % memory D -------------------------------------------------------------------------------ame_minfree_mem n/a 8M 8M 64K 16383M bytes D ame_maxfree_mem -------------------------------------------------------------------------------ams_loan_policy n/a 1 1 0 2 numeric D -------------------------------------------------------------------------------enhanced_affinity_affin_time 1 1 1 0 100 numeric D -------------------------------------------------------------------------------enhanced_affinity_vmpool_limit 10 10 10 -1 100 numeric D -------------------------------------------------------------------------------esid_allocator 0 0 0 0 1 boolean D -------------------------------------------------------------------------------force_relalias_lite 0 0 0 0 1 boolean D -------------------------------------------------------------------------------kernel_heap_psize 0 0 0 0 16M bytes B -------------------------------------------------------------------------------lgpg_regions 0 0 0 0 8E-1 D lgpg_size ------------------------------------------------------------------------------- NAME CUR DEF BOOT MIN MAX UNIT TYPE DEPENDENCIES -------------------------------------------------------------------------------aio_active 1 1 boolean S -------------------------------------------------------------------------------aio_maxreqs 64K 64K 64K 4K 1M numeric D -------------------------------------------------------------------------------aio_maxservers 30 30 30 1 20000 numeric D aio_minservers -------------------------------------------------------------------------------aio_minservers 3 3 3 0 20000 numeric D aio_maxservers ------------------------------------------------ Copyright IBM Corporation 2012

16

Strategic Thoughts, Concepts, Considerations, and Tactics

Monitoring AIX Usage, Meaning and Interpretation Review component technology of the infrastructure, i.e. proper tuning-by-hardware Review implemented AIX constructs, i.e. firm near-static structures and settings Review historical/accumulated AIX events, i.e. usages, pendings, counts, blocks, etc. Monitor dynamic AIX command behaviors, i.e. ps, vmstat, mpstat, iostat, etc. Recognizing Common Performance-degrading Scenarios High Load Average relative to count-of-LCPUs, i.e. over-threadedness vmstat:memory:avm near-to or greater-than lruable-gbRAM, i.e. over-committed Continuous low vmstat:memory:fre with persistent lrud (fr:sr) activity Continuous high ratio of vmstat:kthr:b relative to vmstat:kthr:r Poor ratio of pages freed to pages examined (fr:sr ratio) in vmstat -s output

Copyright IBM Corporation 2012

17

uptime; vmstat s Review accumulated count-of-events over days-uptime

12:00AM

up 18 days, 6:53, 1 user, load average: 12.99, 12.30, 12.13 1356958409 total address trans. faults 276638320 page ins 260776199 page outs 3259560 paging space page ins 4195229 paging space page outs 0 total reclaims 442257234 zero filled pages faults 849546 executable filled pages faults 458258136 pages examined by clock 214 revolutions of the clock hand 277114986 pages freed by the clock 16045503 backtracks 2770 free frame waits 0 extend XPT waits 11026835 pending I/O waits 536747261 start I/Os 32579821 iodones 138394979018 cpu context switches 34131579015 device interrupts 19730395799 software interrupts 3300305278 decrementer interrupts 910908738 mpc-sent interrupts 910908138 mpc-receive interrupts 429034782 phantom interrupts 0 traps 2395294772518 syscalls
Copyright IBM Corporation 2012

18

uptime; vmstat s Review accumulated count-of-events over days-uptime

12:00AM

up 18 days,

6:53,

1 user,

load average: 12.99, 12.30, 12.13

1356958409 total address trans. faults


276638320 260776199 3259560 4195229 0 total 442257234 849546 page ins page outs paging space page paging space page reclaims zero filled pages executable filled

ins outs faults pages faults

address translation faults Incremented for each occurrence of an address translation page fault. I/O may or may not be required to resolve the page fault. Storage protection page faults (lock misses) are not included in this count.

address translation faults occur when virtual-to-physical memory address translations are required when: address translation faults occur when virtual-to-physical memory address translations are required when: creating/initiating/forking/extending processes (that is, memory is needed to store a process contents), i.e zero creating/initiating/forking/extending processes (that is, memory is needed to store a process contents), i.e. zero filled pages faults and executable filled pages faults filled pages faults and executable filled pages faults instructions instructions or data are initially read or written to/from persistent storage, i.e. page ins and page outs or data are initially read or written to/from persistent storage, i.e. page ins and page outs memory memory needed by AIX manage other operations, i.e. network IO mbuf allocations, creating SHMSEGs, isis needed by AIX toto manage other operations, i.e. network IO mbuf allocations, creating SHMSEGs, dynamic allocation of LVM/JFS2 fsbufs, etc. etc. dynamic allocation of LVM/JFS2 fsbufs,

Copyright IBM Corporation 2012

19

uptime; vmstat s Review accumulated count-of-events over days-uptime

12:00AM

up 18 days, 6:53, 1 user, load average: 12.99, 12.30, 12.13 1356958409 total address trans. faults

276638320 page ins 260776199 page outs


3259560 4195229 0 total 442257234 849546 paging space page paging space page reclaims zero filled pages executable filled ins outs faults pages faults

page ins Incremented for each page read in by the virtual memory manager. The count is incremented for page ins from page space and file space. Along with the page out statistic, this represents the total amount of real I/O initiated by the virtual memory manager. [These are generally JFS/JFS2/NFS filesystem reads] page outs Incremented for each page written out by the virtual memory manager. The count is incremented for page outs to page space and for page outs to file space. Along with the page in statistic, this represents the total amount of real I/O initiated by the virtual memory manager. [These are generally JFS/JFS2/NFS filesystem writes]

Copyright IBM Corporation 2012

20

uptime; vmstat s Review accumulated count-of-events over days-uptime

12:00AM

up 18 days, 6:53, 1 user, load average: 12.99, 12.30, 12.13 1356958409 total address trans. faults 276638320 page ins 260776199 page outs

3259560 paging space page ins 4195229 paging space page outs
0 total reclaims 442257234 zero filled pages faults 849546 executable filled pages faults

paging space page ins Incremented for VMM initiated page ins from paging space only. paging space page outs Incremented for VMM initiated page outs to paging space only. Only Onlycomputational computationalmemory memoryis isever everwritten-to written-toor orread-from read-fromthe thepaging pagingspace; space;the thepaging pagingspace spaceextends extendscomputational computational memory. For any Days Uptime , acceptable tolerance is up to 5 digits of paging space page outs . For any Uptime, memory. For any Days Uptime, acceptable tolerance is 5 digits of paging space page outs. For any Days Days Uptime , your concern for performance degradation should greater grow exponentially greater for digit your concern for performance degradation should grow exponentially for each digit beyond 5 each digits of beyond 5 digits of paging paging space page space outs. page outs. But,of ofcourse, course,you youmight mightbe beasking: asking: What Whatis isComputational ComputationalMemory? Memory? But,

Copyright IBM Corporation 2012

21

What is Computational memory? What is File memory (aka Non-Computational memory)

Computational memory
Computational memory is used while your processes are actually working on computing information. These working segments are temporary (transitory) and only exist up until the time a process terminates or the page is stolen. They have no real permanent disk storage location. When a process terminates, both the physical and paging spaces are released in many cases. When there is a large spike in available pages, you can actually see this happening while monitoring your system. When free physical memory starts getting low, programs that have not used recently are moved from RAM to paging space to help release physical memory for more real work.

File memory (aka Non-Computational memory)


File memory (unlike computational memory) uses persistent segments and has a permanent storage location on the disk. Data files or executable programs are mapped to persistent segments rather than working segments. The data files can relate to filesystems, such as JFS, JFS2, or NFS. They remain in memory until the file is unmounted, a page is stolen, or a file is unlinked. After the data file is copied into RAM, VMM controls when these pages are overwritten or used to store other data. Given the alternative, most people would much rather have file memory paged to disk rather than computational memory. When a process references a page which is on disk, it must be paged, which could cause other pages to page out again. VMM is constantly lurking and working in the background trying to steal frames that have not been recently referenced, using the page replacement algorithm discussed earlier. It also helps detect thrashing, which can occur when memory is extremely low and pages are constantly being paged in and out to support processing. VMM actually has a memory load control algorithm, which can detect if the system is thrashing and actually tries to remedy the situation. Unabashed thrashing can literally cause a system to come to a standstill, as the kernel becomes too concerned with making room for pages than actually doing anything productive.

Source verbatim: Ken Milberg/Martin Brown http://www.ibm.com/developerworks/aix/library/au-aix7memoryoptimize1/index.html


Copyright IBM Corporation 2012

22

uptime; vmstat s Review accumulated count-of-events over days-uptime

12:00AM

up 18 days, 6:53, 1 user, load average: 12.99, 12.30, 12.13 1356958409 total address trans. faults 276638320 page ins 260776199 page outs 3259560 paging space page ins 4195229 paging space page outs 0 total reclaims

442257234 zero filled pages faults 849546 executable filled pages faults

zero-filled page faults


Incremented if the page fault is to working storage and can be satisfied by assigning a frame and zero-filling it.

executable-filled page faults


Incremented for each instruction page fault.

zero-filled page faults are used to allocate memory when creating, initializing, forking or extending AIX zero-filled page faults are used to allocate memory when creating, initializing, forking or extending AIX processes to be executed, such as when starting-up a database instance, or executing Java applications. They not processes to be executed, such as when starting-up a database instance, or executing Java applications. They dodo not involve storage IO. They also load the TLB for fast next access. By definition, they are only computational memory. involve storage IO. They also load the TLB for fast next access. By definition, they are only computational memory. executable filled pages faults areto used to allocate memory designated to house binary-executable executable filled pages faults are used allocate memory designated to house binary-executable instructions, instructions, and they do involve storage read IOs. They also load the TLB for fast next access. By definition, they and they do involve storage read IOs. They also load the TLB for fast next access. By definition, they are only are only computational memory. computational memory.

Copyright IBM Corporation 2012

23

uptime; vmstat s Review accumulated count-of-events over days-uptime

12:00AM

up 18 days, 6:53, 1 user, load average: 12.99, 12.30, 12.13 1356958409 total address trans. faults 276638320 page ins 260776199 page outs 3259560 paging space page ins 4195229 paging space page outs 0 total reclaims 442257234 zero filled pages faults 849546 executable filled pages faults 458258136 pages examined by clock 214 revolutions of the clock hand 277114986 pages freed by the clock 16045503 backtracks 2770 free frame waits 0 extend XPT waits 11026835 pending I/O waits 536747261 start I/Os 32579821 iodones 138394979018 cpu context switches 34131579015 device interrupts 19730395799 software interrupts 3300305278 decrementer interrupts 910908738 mpc-sent interrupts 910908138 mpc-receive interrupts 429034782 phantom interrupts 0 traps 2395294772518 syscalls
Copyright IBM Corporation 2012

24

uptime; vmstat s Review accumulated count-of-events over days-uptime

458258136 pages examined by clock 214 revolutions of the clock hand 277114986 pages freed by the clock pages examined by the clock VMM uses a clock-algorithm to implement a pseudo least recently used (lru) page replacement scheme. Pages are aged by being examined by the clock. This count is incremented for each page examined by the clock. revolutions of the clock hand Incremented for each VMM clock revolution (that is, after each complete scan of memory). pages freed by the clock Incremented for each page the clock algorithm selects to free from real memory.

Typically, [pages freed by the clock / pages examined by the clock] is comfortably greater than 0.40, Typically, [pages freed by the clock / pages examined by the clock] is comfortably greater than 0.40, i.e. i.e.277114986 277114986/ /458258136 458258136= =0.60471 0.60471 Ifnot notgreater greaterthan than0.40, 0.40,then thenthe thelower lowerthis thisvalue valuereaches reachesbelow below0.40, 0.40,the themore morelikely likelygbRAM gbRAMneeds needsto tobe beadded. added. If This is a or or confirming factor suggesting more gbRAM may bebe needed; it is a definitive indicator. This iscontributing a contributing confirming factor suggesting more gbRAM may needed; it not is not a definitive indicator.
pages examined by the clock is the historical accumulation of AIX:vmstat:page:sr activity (aka lrud-scanrate). pages the clock is the historical accumulation of AIX:vmstat:page:sr activity (aka lrud-scanrate). pages examined freed by by the clock is the historical accumulation of AIX:vmstat:page:fr activity (aka lrud-freerate). pages freed by the clock is the historical accumulation of AIX:vmstat:page:fr activity (aka lrud-freerate).

Copyright IBM Corporation 2012

25

uptime; vmstat s Review accumulated count-of-events over days-uptime

12:00AM

up 18 days, 6:53, 1 user, load average: 12.99, 12.30, 12.13 1356958409 total address trans. faults 276638320 page ins 260776199 page outs 3259560 paging space page ins 4195229 paging space page outs 0 total reclaims 442257234 zero filled pages faults 849546 executable filled pages faults 458258136 pages examined by clock 214 revolutions of the clock hand 277114986 pages freed by the clock 16045503 backtracks 2770 free frame waits 0 extend XPT waits 11026835 pending I/O waits 536747261 start I/Os 32579821 iodones 138394979018 cpu context switches 34131579015 device interrupts 19730395799 software interrupts 3300305278 decrementer interrupts 910908738 mpc-sent interrupts 910908138 mpc-receive interrupts 429034782 phantom interrupts 0 traps 2395294772518 syscalls
Copyright IBM Corporation 2012

26

uptime; vmstat s Review accumulated count-of-events over days-uptime

16045503 backtracks 2770 free frame waits


0 11026835 536747261 32579821 extend XPT waits pending I/O waits start I/Os iodones

backtracks Incremented for each page fault that occurs while resolving a previous page fault. (The new page fault must be resolved first and then initial page faults can be backtracked.) free frame waits Incremented each time a process requests a page frame, the free list is empty, and the process is forced to wait while the free list is replenished.

The count of backtracks monitors the relative intensity or duration of coincident and near-coincident page faulting activity. It can generally distinguish a steady consistently-moderate pattern (low count) from page a frenetically The count of backtracks monitors the relative intensity or durationworkload of coincident and near-coincident faulting spiking, peaking, bursting or burning workload pattern (high count). activity. It can generally distinguish a steady consistently-moderate workload pattern (low count) from a frenetically spiking, peaking, bursting or burning workload pattern (high count). The count of free frame waits increases when free memory repeatedly reaches down to zero and slightly back up. High counts indicate a likely start/stop stuttering of user workload progress, as well as, frustrating unfettered The count IO of free frame increases whenwith free harsh memory repeatedly reaches down to zero and slightly back as storage throughput; this waits is typically associated bursts and burns of AIX:lrud scanning and freeing, up. High a likely start/stop stuttering of user workload well as,counts higherindicate CPU-kernel time (i.e. AIX:vmstat:cpu:sy >25%).progress, as well as, frustrating unfettered storage IO throughput; this is typically associated with harsh bursts and burns of AIX:lrud scanning and freeing, as well as, higher CPU-kernel time (i.e. AIX:vmstat:cpu:sy >25%).

Copyright IBM Corporation 2012

27

uptime; vmstat s Review accumulated count-of-events over days-uptime

12:00AM

up 18 days, 6:53, 1 user, load average: 12.99, 12.30, 12.13 1356958409 total address trans. faults 276638320 page ins 260776199 page outs 3259560 paging space page ins 4195229 paging space page outs 0 total reclaims 442257234 zero filled pages faults 849546 executable filled pages faults 458258136 pages examined by clock 214 revolutions of the clock hand 277114986 pages freed by the clock 16045503 backtracks 2770 free frame waits 0 extend XPT waits 11026835 pending I/O waits 536747261 start I/Os 32579821 iodones 138394979018 cpu context switches 34131579015 device interrupts 19730395799 software interrupts 3300305278 decrementer interrupts 910908738 mpc-sent interrupts 910908138 mpc-receive interrupts 429034782 phantom interrupts 0 traps 2395294772518 syscalls
Copyright IBM Corporation 2012

28

uptime; vmstat s Review accumulated count-of-events over days-uptime

16045503 backtracks 2770 free frame waits 0 extend XPT waits

11026835 pending I/O waits 536747261 start I/Os 32579821 iodones

pending I/O waits Incremented each time a process is waited by VMM for a page-in I/O to complete. start I/Os Incremented for each read or write I/O request initiated by VMM. iodones Incremented at the completion of each VMM I/O request. High counts counts of of pending pending I/O waits could indicate long page-in I/OI/O latencies, or perhaps processes awaiting page-in High I/O waits could indicate long page-in latencies, or processes awaiting page-in I/O are I/Orepeatedly are repeatedly/rapidly scheduled to a CPU before the page-in I/O completes, or both in varying degrees. or too rapidly returned the CPU before the page-in I/O completes, or both in varying degrees. Acceptable tolerance is up to 80% of iodones ; warning is 81%-100% of iodones ; seek-resolution is beyond ; warning is 81%-100% of iodones ; seek-resolution is beyond 100% of Acceptable tolerance is up to 80% of iodones 100% of iodones , i.e. pending I/O waits / iodones => 11026835/32579821 = 33.84% = Acceptable iodones, i.e. pending I/O waits / iodones => 11026835 / 32579821 = 33.84% = Acceptable start I/Os are generally the sum page ins page outs . start I/Os are generally the sum ofof page ins andand page outs . The ratio of start I/Os iodones a relative indicator of sequential coalescence. Sequential read-aheads start I/Os toto iodones is ais relative indicator of sequential I/OI/O coalescence. Sequential read-aheads and sequential write-behinds of JFS2 default-mount I/O transactions are automatically coalesced to fewer larger and sequential write-behinds of JFS2 default-mount I/O transactions are automatically coalesced to fewer larger I/OI/O transactions. This is a quick&dirty method of distinguishing a generally random IO versus sequential IO workload, transactions. is a quick&dirty method of distinguishing a generally random IO versus sequential workload, I/Os/iodones=>536747261/32579821=16.47 is a moderate Sequential IO reduction IO ratio. i.e. start This i.e. start I/Os/iodones=>536747261/32579821=16.47 is a moderately-low Sequential IO reduction ratio.
Copyright IBM Corporation 2012

29

uptime; vmstat s Review accumulated count-of-events over days-uptime

12:00AM

up 18 days, 6:53, 1 user, load average: 12.99, 12.30, 12.13 1356958409 total address trans. faults 276638320 page ins 260776199 page outs 3259560 paging space page ins 4195229 paging space page outs 0 total reclaims 442257234 zero filled pages faults 849546 executable filled pages faults 458258136 pages examined by clock 214 revolutions of the clock hand 277114986 pages freed by the clock 16045503 backtracks 2770 free frame waits 0 extend XPT waits 11026835 pending I/O waits 536747261 start I/Os 32579821 iodones 138394979018 cpu context switches 34131579015 device interrupts 19730395799 software interrupts 3300305278 decrementer interrupts 910908738 mpc-sent interrupts 910908138 mpc-receive interrupts 429034782 phantom interrupts 0 traps 2395294772518 syscalls
Copyright IBM Corporation 2012

30

uptime; vmstat s Review accumulated count-of-events over days-uptime

138394979018 34131579015 19730395799 3300305278 2395294772518

cpu context switches device interrupts software interrupts decrementer interrupts syscalls

CPU context switches Incremented for each processor context switch (dispatch of a new process). device interrupts Incremented on each hardware interrupt. software interrupts Incremented on each software interrupt. A software interrupt is a machine instruction similar to a hardware interrupt that saves some state and branches to a service routine. System calls are implemented with software interrupt instructions that branch to the system call handler routine. decrementer interrupts Incremented on each decrementer interrupt. syscalls Incremented for each system call.

Copyright IBM Corporation 2012

31

uptime; vmstat s Review accumulated count-of-events over days-uptime

138394979018 34131579015 19730395799 3300305278 2395294772518

cpu context switches device interrupts software interrupts decrementer interrupts syscalls

Note the paired ratios of the above for a relative sense-of-proportion of system events. Whatthe is useful ratio of cpu context switches : decrementer interrupts? Note pairedabout ratiosthe of the above for a relative sense-of-proportion of system events. 138394979018 / 3300305278 = an average of 42 : device interrupts per decrementer interrupt switches decrementer interrupts ? What is useful about the ratio of cpu context 138394979018 / 3300305278 = an average of 42 device interrupts per decrementer interrupt What is useful about the ratio of device interrupts : decrementer interrupts? What is useful about /the ratio of device interrupts : decrementer interrupts? 34131579015 3300305278 = an average of 10 device interrupts per decrementer interrupt 34131579015 / 3300305278 = an average of 10 device interrupts per decrementer interrupt What is is useful useful about about the the ratio ratio of of syscalls syscalls : decrementer interrupts ? What : decrementer interrupts ? 2395294772518 / 3300305278 = an= average of 726 system calls perper decrementer interrupt 2395294772518 / 3300305278 an average of 726 system calls decrementer interrupt What is useful about the ratio of device : syscalls : cpu context switches ? deviceinterrupts interrupts : syscalls : cpu context switches ? 34131579015 : 2395294772518 : 138394979018 ~= 10:726:42 per decrementer interrupt 34131579015 : 2395294772518 : 138394979018 ~= 10:726:42 per decrementer interrupt

Copyright IBM Corporation 2012

32

Determine points of exhaustion, limitation, and over-commitment Determine surplus resources: CPUcycles, RAM, SAN I/O thruput, etc.

uptime; vmstat -s 12:46AM up 139 days, 1:29, 0 users, load average: 9.24, 4.21, 2.99 36674080366 total address trans. faults 303594828999 page ins # filesystem reads from disk; vmstat:page:fi 65127100071 page outs # filesystem writes to disk; vmstat:page:fo 17 paging space page ins # vmstat:page:pi 166 paging space page outs # vmstat:page:po 0 total reclaims 10153151099 zero filled pages faults 379929 executable filled pages faults # vmstat:page:sr 790677067990 pages examined by clock 102342 revolutions of the clock hand 323578511315 pages freed by the clock # vmstat:page:fr 216779474 backtracks 173781776 free frame waits # waits when vmstat:memory:fre equals 0 0 extend XPT waits 13118848968 pending I/O waits 369118024444 start I/Os 21394237531 iodones 115626032109 cpu context switches # vmstat:faults:cs 25244855068 device interrupts # fc/ent/scsi interrupts; vmstat:faults:in 3124067547 software interrupts # software interrupts 14571190906 decrementer interrupts # lcpu decrementer clock interrupts 56397341 mpc-sent interrupts 56396919 mpc-receive interrupts 32316580 phantom interrupts 0 traps 739431511068 syscalls # total system calls (akin to miles traveled)
Copyright IBM Corporation 2012

33

uptime; vmstat v Review accumulated count-of-events over days-uptime

12:00AM

up 18 days, 6160384 5954432 156557 3 883319 80.0 3.0 90.0 1.2 77369 0.0 0 1.2 90.0 77369 0 19 1019076 2359 0 204910 96.2

6:53, 1 user, load average: 12.99, 12.30, 12.13 memory pages lruable pages free pages memory pools pinned pages maxpin percentage minperm percentage maxperm percentage numperm percentage file pages compressed percentage compressed pages numclient percentage maxclient percentage client pages remote pageouts scheduled pending disk I/Os blocked with no pbuf paging space I/Os blocked with no psbuf filesystem I/Os blocked with no fsbuf client filesystem I/Os blocked with no fsbuf external pager filesystem I/Os blocked with no fsbuf percentage of memory used for computational pages
Copyright IBM Corporation 2012

34

uptime; vmstat v Review accumulated count-of-events over days-uptime

12:00AM

up 18 days, 6:53, 1 user, 6160384 memory pages 5954432 lruable pages

load average: 12.99, 12.30, 12.13

156557 free pages


3 memory pools 883319 pinned pages

# real-time count of freemem pages is continuously changing

free pages Number of free 4 KB pages. The AIX VMM managed count of free free pages pages set AIX:vmo:minfree and AIX:vmo:maxfree . Varying is is set byby AIX:vmo:minfree and AIX:vmo:maxfree . Varying with AIX incidental count of memory pools , the count of free is maintained by default within a pools , the count of free pages pages is typically maintained by default within with AIX andand thethe incidental count of memory 4-digit range of 4KB pages, say between 2880 and 3264 of KB pages. a 4-digit range of 4KB pages, i.e. between 2880 and 3264 of 44 KB pages. Meanwhile, enterprise-class enterprise-class infrastructures infrastructures can can sustain sustain JFS2 JFS2 filesystem filesystem I/O I/O throughputs throughputs of of 5-digits 5-digits of of 4KB 4KB reads reads and and Meanwhile, writes if-and-only-if a greater count of free alwaysis available to acceptto the readsthe and writes. writes but if-and-only-if a greater count of pages free is pages always available accept reads and writes. Remember: The count of free frame waits increases when free memory repeatedly reaches down to zero and Remember: TheHigh count of free frame waits increases when free memory repeatedly reaches down zero and slightly back up. counts indicate a likely start/stop stuttering of user workload progress, as well as,to frustrating slightly storage back up. counts indicate a likely associated start/stop stuttering of userand workload progress, as well as, frustrating unfettered IOHigh throughput; this is typically with harsh bursts burns of AIX:lrud scanning and unfettered storage IO throughput; this is typically associated with harsh bursts and burns of AIX:lrud scanning and freeing, as well as, higher CPU-kernel time (i.e. AIX:vmstat:cpu:sy >25%). freeing, as well as, higher CPU-kernel time (i.e. AIX:vmstat:cpu:sy >25%).

Copyright IBM Corporation 2012

35

Reduce free frame waits by raising minfree and maxfree higher than default

=== command: vmstat sv Note: low 4 digits of free frame waits with a nice 6 digits of free pages; while theres enough freemem, IO (i.e. fi,fo) continues unfettered 2770 free frame waits 156557 free pages === command: vmo L Note: maxfree=8704 (default=1088), minfree=8K (default=960); incidentally, this LPAR has 3 memory pools -------------------------------------------------------------------------------maxfree 8704 1088 8704 16 4812K 4KB pages D minfree memory_frames -------------------------------------------------------------------------------minfree 8K 960 8K 8 4812K 4KB pages D maxfree memory_frames -------------------------------------------------------------------------------=== command: vmstat Iwt 1 Note: mempool_count*maxfree=3*8704=26112; mempool_count*minfree=3*8192=24576, (fre=24576 starts fr:sr lrud scanning&freeing) System configuration: lcpu=24 mem=24064MB ent=6.00 kthr memory page faults cpu time ----------- --------------------- ------------------------------------ ------------------ ----------------------- -------r b p avm fre fi fo pi po fr sr in sy cs us sy id wa pc ec hr mi se 11 1 0 7539030 117080 2 548 206 0 0 0 32730 2188826 175026 37 35 1 27 5.52 92.1 00:01:12 17 1 0 7540803 115005 0 132 146 0 0 0 32581 2178059 169341 39 33 0 28 5.51 91.9 00:01:13 12 0 2 7540365 114924 2 316 16 0 0 0 30417 2188135 171948 36 36 0 28 5.52 92.0 00:01:14 10 0 2 7540654 106310 0 5863 29 0 0 0 33112 2134251 177073 34 38 0 28 5.52 92.0 00:01:15 16 0 0 7540058 94698 8 8459 17 0 0 0 33147 2076084 173134 35 40 1 25 5.59 93.1 00:01:16 23 0 0 7544739 83097 0 4518 15 0 0 0 32137 2098672 170494 39 38 2 22 5.68 94.7 00:01:17 0 0 0 7552531 70637 19 3518 44 0 0 0 27911 2207363 166832 43 33 7 18 5.65 94.1 00:01:18 11 2 0 7560676 61953 23 14471 38 0 0 0 24444 2196741 154363 43 30 9 18 5.55 92.6 00:01:19 17 0 0 7570158 50021 66 11393 41 1733 4661 13412 30670 2063644 166578 39 40 7 14 5.75 95.8 00:01:20 13 2 0 7570331 39515 17 24859 10 8366 24671 71521 32332 1830946 163441 37 46 5 12 5.81 96.9 00:01:21 17 3 0 7569607 42002 14 3569 6 4458 2643 4022 26678 2219614 165593 46 29 7 18 5.62 93.7 00:01:22 17 9 0 7569539 46795 22 1 4 2808 0 0 26107 2179201 164453 43 31 6 20 5.61 93.5 00:01:23 13 10 0 7569524 49434 7 1 3 2522 0 0 26521 2216482 166354 40 31 7 22 5.48 91.3 00:01:24 21 6 0 7569511 53096 0 1 10 3530 0 0 26437 2184553 164387 40 32 6 22 5.54 92.3 00:01:25

Universal Recommendation: If default maxfree minfree,,and 6+ digits of free frame frame waits perwaits any 90 days uptime, Recommendation: If default maxfree && minfree and 6+ digits of free per any 90 days uptime, 1) use vmo to tune minfree=(5*2048), maxfree=(6*2048); 2) use ioo to tune j2_MaxPageReadAhead=2048. 1) use vmo to tune minfree=(5*2048), maxfree=(6*2048); 2) use ioo to tune j2_MaxPageReadAhead=2048.
Copyright IBM Corporation 2012

36

uptime; vmstat v Review accumulated count-of-events over days-uptime

12:00AM

up 18 days, 6160384 5954432 156557 3 883319 80.0 3.0 90.0 1.2 77369 0.0 0 1.2 90.0 77369 0 19 1019076 2359 0 204910 96.2

6:53, 1 user, load average: 12.99, 12.30, 12.13 memory pages lruable pages free pages memory pools pinned pages maxpin percentage minperm percentage maxperm percentage numperm percentage # a real-time % indicator of disk IO cache file pages compressed percentage compressed pages numclient percentage # a real-time % indicator of disk IO cache maxclient percentage client pages remote pageouts scheduled pending disk I/Os blocked with no pbuf paging space I/Os blocked with no psbuf filesystem I/Os blocked with no fsbuf client filesystem I/Os blocked with no fsbuf external pager filesystem I/Os blocked with no fsbuf percentage of memory used for computational pages

Copyright IBM Corporation 2012

37

uptime; vmstat v Review accumulated count-of-events over days-uptime

3.0 minperm percentage 90.0 maxperm percentage 1.2 numperm percentage 1.2 numclient percentage # Warning when less than/equal minperm% 90.0 maxclient percentage minperm percentage Tuning parameter (managed using vmo) in percentage of real memory. This specifies the point below which file pages are protected from the re-page algorithm. maxperm percentage Tuning parameter (managed using vmo) in percentage of real memory. This specifies the point above which the page stealing algorithm steals only file pages. numperm percentage Percentage of memory currently used by the file cache. numclient percentage Percentage of memory occupied by client pages. maxclient percentage Tuning parameter (managed using vmo) specifying the maximum percentage of memory which can be used for client pages.
Copyright IBM Corporation 2012

# Warning when less than/equal minperm%

38

uptime; vmstat v Review accumulated count-of-events over days-uptime

12:00AM

up 18 days, 6160384 5954432 156557 3 883319 80.0 3.0 90.0 1.2 77369 0.0 0 1.2 90.0 77369 0 19 1019076 2359 0 204910 96.2

6:53, 1 user, load average: 12.99, 12.30, 12.13 memory pages lruable pages free pages memory pools pinned pages maxpin percentage minperm percentage maxperm percentage numperm percentage file pages compressed percentage compressed pages numclient percentage maxclient percentage client pages remote pageouts scheduled pending disk I/Os blocked with no pbuf paging space I/Os blocked with no psbuf filesystem I/Os blocked with no fsbuf client filesystem I/Os blocked with no fsbuf external pager filesystem I/Os blocked with no fsbuf percentage of memory used for computational pages
Copyright IBM Corporation 2012

39

uptime; vmstat v Review accumulated count-of-events over days-uptime

19 1019076 2359 0 204910

pending disk I/Os blocked with no pbuf paging space I/Os blocked with no psbuf filesystem I/Os blocked with no fsbuf client filesystem I/Os blocked with no fsbuf external pager filesystem I/Os blocked with no fsbuf

pending disk I/Os blocked with no pbuf Number of pending disk I/O requests blocked because no pbuf was available. Pbufs are pinned memory buffers used to hold I/O requests at the logical volume manager layer. Count is currently for the rootvg: only. paging space I/Os blocked with no psbuf Number of paging space I/O requests blocked because no psbuf was available. Psbufs are pinned memory buffers used to hold I/O requests at the virtual memory manager filesystem I/Os blocked with no fsbuf Number of filesystem I/O requests blocked because no fsbuf was available. Fsbuf are pinned memory buffers used to hold I/O requests in the filesystem layer. client filesystem I/Os blocked with no fsbuf Number of client filesystem I/O requests blocked because no fsbuf was available. NFS (Network File System) and VxFS (Veritas) are client filesystems. Fsbuf are pinned memory buffers used to hold I/O requests in the filesystem layer. external pager filesystem I/Os blocked with no fsbuf Number of external pager client filesystem I/O requests blocked because no fsbuf was available. JFS2 is an external pager client filesystem. Fsbuf are pinned memory buffers used to hold I/O requests in the filesystem layer.

Copyright IBM Corporation 2012

40

uptime; vmstat v Review accumulated count-of-events over days-uptime

19 pending disk I/Os blocked with no pbuf


1019076 2359 0 204910 paging space I/Os blocked with no psbuf filesystem I/Os blocked with no fsbuf client filesystem I/Os blocked with no fsbuf external pager filesystem I/Os blocked with no fsbuf

# stat for rootvg only

Use AIX:lvmo to monitor the pervg_blocked_io_count of each active LVM volume group, i.e. lvmo a v rootvg ; echo ; lvmo a v datavg
vgname = rootvg pv_pbuf_count = 512 total_vg_pbufs = 512 max_vg_pbufs = 16384 pervg_blocked_io_count = 19 pv_min_pbuf = 512 max_vg_pbuf_count = 0 global_blocked_io_count = 1566 vgname = datavg pv_pbuf_count = 512 total_vg_pbufs = 2048 max_vg_pbufs = 65536 pervg_blocked_io_count = 475 pv_min_pbuf = 512 max_vg_pbuf_count = 0 global_blocked_io_count = 1566

Acceptable tolerance is 4-digits of pervg_blocked_io_count per LVM volume group for any 90 days uptime.

Copyright IBM Corporation 2012

41

Determine points of exhaustion, limitation, and over-commitment Determine surplus resources: CPUcycles, RAM, SAN I/O thruput, etc.

# lvmo -a -v rootvg # 270 days uptime for counters below vgname = rootvg pv_pbuf_count = 512 total_vg_pbufs = 1024 # total_vg_pbufs / pv_pbuf_count = 1024/512 = 2 LUNs max_vg_pbuf_count = 16384 pervg_blocked_io_count = 90543 pv_min_pbuf = 512 global_blocked_io_count = 12018771 # lvmo -a -v apvg15 vgname = apvg15 pv_pbuf_count = 512 total_vg_pbufs = 15872 # total_vg_pbufs / pv_pbuf_count = 15872/512 = 31 LUNs max_vg_pbuf_count = 524288 pervg_blocked_io_count = 517938 pv_min_pbuf = 512 global_blocked_io_count = 12018771 # lvmo -a -v pgvg01 vgname = pgvg01 pv_pbuf_count = 512 total_vg_pbufs = 1024 # total_vg_pbufs / pv_pbuf_count = 1024/512 = 2 LUNs max_vg_pbuf_count = 16384 pervg_blocked_io_count = 8612687 pv_min_pbuf = 512 global_blocked_io_count = 12018771

Copyright IBM Corporation 2012

42

Increase total_vg_pbufs to resolve high pervg_blocked_io_count

19 pending disk I/Os blocked with no pbuf # stat for rootvg only

Four factors complicate how to resolve high counts of pervg_blocked_io_count:


The number of pbufs per physical volume when its added to the volume group, i.e. the value of AIX:lvmo:pv_pbuf_count The count and size of physical volumes (aka LUNs or hdisks) assigned to the LVM VG The count and size of the JFS2:LVM logical volumes created on the VGs physical volumes, i.e. a reasonable balance of JFS2 fsbufs-toVG pbufs favors optimal performance. Having either too few or too many VG:pbuf can severely hamper performance and throughput.

As such, we should only add pbufs by-formula on a schedule of 90-day change&observe cycles. Use AIX:lvmo to monitor the pervg_blocked_io_count of each active LVM volume group, i.e. lvmo a v rootvg ; echo ; lvmo a v datavg Acceptable tolerance is 4-digits of pervg_blocked_io_count per LVM volume group for any 90 days uptime. Otherwise, for each LVM volume group, adjust the value of AIX:lvmo:pv_pbuf_count accordingly:
If 5-digits of pervg_blocked_io_count, add ~2048 pbufs to total_vg_pbufs per 90-day cycle. If 6-digits of pervg_blocked_io_count, add ~[4*2048] pbufs to total_vg_pbufs per 90-day cycle. If 7-digits of pervg_blocked_io_count, add ~[8*2048] pbufs to total_vg_pbufs per 90-day cycle. If 8-digits of pervg_blocked_io_count, add ~[12*2048] pbufs to total_vg_pbufs per 90-day cycle. If 9-digits of pervg_blocked_io_count, add ~[16*2048] pbufs to total_vg_pbufs per 90-day cycle.

Use AIX:lvmo to confirm/verify the value of total_vg_pbufs for each VG.


Copyright IBM Corporation 2012

43

uptime; vmstat v Review accumulated count-of-events over days-uptime

12:00AM

up 18 days, 6:53, 1 user, load average: 12.99, 12.30, 12.13 1356958409 total address trans. faults 276638320 page ins 260776199 page outs 3259560 paging space page ins

4195229 paging space page outs


0 total reclaims

paging space page outs Incremented for VMM initiated page outs to paging space only.
19 pending disk I/Os blocked with no pbuf

1019076 paging space I/Os blocked with no psbuf


2359 filesystem I/Os blocked with no fsbuf 0 client filesystem I/Os blocked with no fsbuf 204910 external pager filesystem I/Os blocked with no fsbuf

paging space I/Os blocked with no psbuf Number of paging space I/O requests blocked because no psbuf was available. Psbufs are pinned memory buffers used to hold I/O requests at the virtual memory manager

The ratio of paging space I/Os blocked with no psbuf / paging space page outs is a direct measure of The ratio of paging space I/Os = blocked psbuf suffering / paging space outs is a direct intensity, i.e. 1019076 / 4195229 24.2%. with In thisno example, 7-digits ofpage paging space page measure outs in of 18 intensity, i.e. 1019076 / 4195229 = 24.2%. In this example, suffering 7-digits of paging space page outs in Days-Uptime is bad but when there they are also paging space I/Os blocked with no psbuf, system blocked with no psbuf, system performance and perfor18 Days-Uptime isenough, bad enough, but when are also mance and keyboard responsiveness can stop-and-start in seconds-long cycles. One might believe AIX when has even keyboard responsiveness can stop-and-start in seconds-long cycles. One might believe AIX crashed, it hasnt. crashed, when it hasnt. Preclude paging space page outs by any means; add more gbRAM to the LPAR.
Copyright IBM Corporation 2012

44

Criteria for Creating a Write-Expedient pagingspace_vg

The first priority should be to preclude any pagingspace-pageouts. Thus, a write-expedient pagingspace is only needed if you have any unavoidable pagingspace-pageout activity. Ultimately, if we must suffer any pagingspace-pageouts, we want them to write-out to the pagingspace as quickly as possible (thus my term: write-expedient). So, for the sake of prudence, we should always create a write-expedient pagingspace. The listed traits below are optimal for write-expediency; include as many as you can (but always apply the key tuning tactic below):
Create a dedicated AIX:LVM:vg (VolumeGroup) called pagingspace_vg Create the pagingspace_vg using FC-SAN storage LUNs (ideally RAID5 LUNs on SSD, FC or SAS technology disk drives, and not on SATA disk drives (which are slower and employs RAID6), nor on any local/internal SAS disks) The total size of the pagingspace in pagingspace_vg should match the size of installed LPAR gbRAM Assign 3-to-8 LUN/hdisks to pagingspace_vg and size each LUN to be an even fraction of installed gbRAM. For instance, if the LPAR has 18gbRAM, then assign three 6gb LUN/hdisks to pagingspace_vg Configure one AIX:LVM:VG:lv (logical volume) for each LUN/hdisk in pagingspace_vg; do not deploy PP-striping (because it messes-up discrete hdisk IO monitoring) - just map one hdisk to one lv The key tuning tactic: With root-user privileges, use AIX:lvmo to set pagingspace_vg:pv_pbuf_count=2048. This will ensure pagingspace_vg:total_vg_pbufs will equal [<VGLUNcount> * pv_pbuf_count]. To set the pv_pbuf_count value to 2048, type the following: lvmo -v pagingspace_vg -o pv_pbuf_count=2048

Copyright IBM Corporation 2012

45

uptime; vmstat v Review accumulated count-of-events over days-uptime

19 pending disk I/Os blocked with no pbuf 1019076 paging space I/Os blocked with no psbuf

2359 filesystem I/Os blocked with no fsbuf 0 client filesystem I/Os blocked with no fsbuf
204910 external pager filesystem I/Os blocked with no fsbuf

filesystem I/Os blocked with no fsbuf

Number of filesystem I/O requests blocked because no fsbuf was available. Fsbuf are pinned memory buffers used to hold I/O requests in the filesystem layer. This count refers to JFS fsbuf exhaustion (vs. JFS2) and isand typically ignored today. Virtually all customers use JFS2. use JFS2. This count refers to JFS fsbuf exhaustion (vs. JFS2) is typically ignored today. Virtually all customers
client filesystem I/Os blocked with no fsbuf

Number of client filesystem I/O requests blocked because no fsbuf was available. NFS (Network File System) and VxFS (Veritas) are client filesystems. Fsbuf are pinned memory buffers used to hold I/O requests in the filesystem layer.
Starting with AIX 6.1 Technology Level 02, the following parameters are obsolete because the network file system (NFS) and (NFS) and the virtual memory manager (VMM)tunes dynamically tunes ofand bufpage structures and page device tables the virtual memory manager (VMM) dynamically the number of the buf number structures device tables (PDTs) based on (PDTs) based on workload: workload: * nfs_v2_pdts * nfs_v2_pdts * nfs_v2_vm_bufs * nfs_v2_vm_bufs * nfs_v3_pdts * nfs_v3_pdts * nfs_v3_vm_bufs * nfs_v4_pdts * nfs_v3_vm_bufs * nfs_v4_vm_bufs * nfs_v4_pdts

Starting with AIX 6.1 Technology Level 02, the following parameters are obsolete because the network file system

* nfs_v4_vm_bufs
Copyright IBM Corporation 2012

46

Resolving high external pager filesystem I/Os blocked with no fsbuf

19 1019076 2359 0

pending disk I/Os blocked with no pbuf paging space I/Os blocked with no psbuf filesystem I/Os blocked with no fsbuf client filesystem I/Os blocked with no fsbuf

204910 external pager filesystem I/Os blocked with no fsbuf

external pager filesystem I/Os blocked with no fsbuf


Number of external pager client filesystem I/O requests blocked because no fsbuf was available. JFS2 is an external pager client filesystem. Fsbuf are pinned memory buffers used to hold I/O requests in the filesystem layer.

Acceptabletolerance tolerance 5-digits 90 Acceptable isis 5 digits perper any 90Days-Uptime. days uptime. First tactic to attempt: If 6-digits, set ioo h j2_dynamicBufferPreallocation=128. First tactic: If 6 digits, set ioo h j2_dynamicBufferPreallocation=128 . First If 7+ digits, set ioo h j2_dynamicBufferPreallocation=256 . Firsttactic: tactic to attempt: If 7+ digits, set ioo h j2_dynamicBufferPreallocation=256.
ioo -h j2_dynamicBufferPreallocation=value The number of 16K slabs to preallocate when the filesystem is running low of bufstructs. A value of 16 represents 256K. The bufstructs for Enhanced JFS (aka JFS2) are now dynamic; the number of buffers that start on the JFS2 filesystem is controlled by j2_nBufferPerPagerDevice (now restricted), but buffers are allocated and destroyed dynamically past this initial value. If the number of external pager filesystem I/Os blocked with no fsbuf increases, the j2_dynamicBufferPreallocation should be increased for that file system, as the I/O load on a file system may be exceeding the speed of preallocation. A value of 0 will disable dynamic buffer allocation completely. Heavy IO workloads should have this value changed to 256. File systems do not need to be remounted to activate.
Copyright IBM Corporation 2012

47

Resolving high external pager filesystem I/Os blocked with no fsbuf

19 1019076 2359 0

pending disk I/Os blocked with no pbuf paging space I/Os blocked with no psbuf filesystem I/Os blocked with no fsbuf client filesystem I/Os blocked with no fsbuf

204910 external pager filesystem I/Os blocked with no fsbuf

external pager filesystem I/Os blocked with no fsbuf Number of external pager client filesystem I/O requests blocked because no fsbuf was available. JFS2 is an external pager client filesystem. Fsbuf are pinned memory buffers used to hold I/O requests in the filesystem layer.

Acceptable tolerance is 5-digits 5 digits per 90 Days-Uptime. (if attempt: first tactic enough): 6 digits, set ioo -o j2_nBufferPerPagerDevice=5120. Second tactic to Ifwasnt 6-digits, set ioo If -o j2_nBufferPerPagerDevice=5120. Second tactic tactic to (if attempt: first tactic enough): 7+ j2_nBufferPerPagerDevice=10240. digits, set ioo -o j2_nBufferPerPagerDevice=10240. Second Ifwasnt 7+digits, set iooIf-o

ioo -o j2_nBufferPerPagerDevice=value [Restricted] This tunable specifies the number of JFS2 bufstructs that start when the filesystems is mounted. Enhanced JFS will allocate more dynamically (see j2_dynamicBufferPreallocation). Ideally, this value should not be tuned, instead j2_dynamicBufferPreallocation should be tuned. However, it may be appropriate to change this value if the number of external pager filesystem I/Os blocked with no fsbuf increases and continues increasing and j2_dynamicBufferPreallocation tuning has already been attempted. If the kernel must wait for a free bufstruct, it puts the process on a wait list before the start I/O is issued and will wake it up once a bufstruct has become available. May be appropriate to increase if striped logical volumes or disk arrays are being used. Heavy IO workloads may require this value to be changed and a good starting point would be 5120 or 10240. File system(s) must be remounted.

Copyright IBM Corporation 2012

48

ps -ekf cumulative since last boot; compare CPU-time of key processes

$ uptime ; ps -ekf | grep -v grep | egrep syncd|lrud|nfsd|biod|wait|getty 12:00AM up 18 days, 6:53, 1 user, load average: 12.99, 12.30, 12.13 root 131076 0 0 Sep 07 - 553:26 wait root 262152 0 0 Sep 07 - 25:37 lrud root 917532 0 0 Sep 07 - 1942:36 wait root 983070 0 0 Sep 07 - 2026:38 wait root 1048608 0 0 Sep 07 - 2030:40 wait root 1114146 0 0 Sep 07 - 612:19 wait root 1179684 0 0 Sep 07 - 1825:26 wait root 1245222 0 0 Sep 07 - 1948:03 wait root 1310760 0 0 Sep 07 - 1949:43 wait root 1376298 0 0 Sep 07 - 585:41 wait root 1441836 0 0 Sep 07 - 1881:58 wait root 1507374 0 0 Sep 07 - 2005:49 wait root 1572912 0 0 Sep 07 - 2010:27 wait root 1638450 0 0 Sep 07 - 615:26 wait root 1703988 0 0 Sep 07 - 1712:18 wait root 1769526 0 0 Sep 07 - 1848:42 wait root 1835064 0 0 Sep 07 - 1853:13 wait root 1900602 0 0 Sep 07 - 528:33 wait root 1966140 0 0 Sep 07 - 1412:40 wait root 2031678 0 0 Sep 07 - 1552:47 wait root 2097216 0 0 Sep 07 - 1558:07 wait root 2162754 0 0 Sep 07 - 658:26 wait root 2228292 0 0 Sep 07 - 1200:31 wait root 2293830 0 0 Sep 07 - 1334:07 wait root 2359368 0 0 Sep 07 - 1228:54 wait root 3014734 1 0 Sep 07 - 0:00 kbiod root 5111986 1 0 Sep 07 - 14:16 /usr/sbin/syncd 60 root 8847594 1 0 Sep 07 - 7:07 /usr/sbin/getty /dev/console $ uptime ; ps -ekf | grep -v grep | egrep syncd|lrud|nfsd|biod|wait|getty | grep -c wait 12:00AM up 18 days, 6:53, 1 user, load average: 12.99, 12.30, 12.13 24 $
Copyright IBM Corporation 2012

49

iostat a cumulative since last boot; mapping&comparing hdisks stats can be useful in characterizing performance-related I/O patterns&trends

$ iostat -a
Disks: hdisk8 hdisk12 hdisk13 hdisk11 hdisk4 hdisk15 hdisk16 hdisk14 hdisk10 hdisk17 hdisk18 hdisk7 hdisk5 hdisk6 hdisk9 hdisk20 hdisk21 hdisk22 hdisk23 hdisk25 hdisk27 hdisk26 hdisk29 hdisk24 hdisk30 hdisk32 hdisk31 hdisk33 hdisk36 hdisk35 % tm_act 2.2 2.4 2.2 2.1 1.9 0.0 11.5 0.0 6.3 0.0 0.0 0.4 0.6 2.3 2.3 8.1 1.6 15.9 3.5 20.3 20.1 41.3 19.2 25.8 16.9 5.1 11.7 47.2 2.4 2.8 Kbps 607.3 607.7 582.8 593.4 216.3 2.2 178.7 1.3 548.9 0.0 3.0 53.3 59.4 624.3 613.6 228.3 99.7 845.2 364.2 740.1 1015.2 2934.5 949.4 1867.8 515.9 555.4 483.8 2760.0 597.9 616.8 tps 1.5 1.3 1.3 1.3 23.9 0.6 23.6 0.0 7.8 0.0 0.1 7.6 6.5 1.5 1.4 35.1 24.8 58.4 60.4 36.5 45.8 118.0 55.7 59.4 38.4 34.2 71.1 153.7 1.3 1.4 Kb_read 2164957533 2065741964 2002751079 2073048903 812230724 25584 1169343088 8828331 3617545529 8560 9741142 272419695 225752039 2175672098 2104140790 1511885833 16230194 5592808968 1627955714 4725304221 6675326923 18806493859 6113262738 12330198268 3271643603 888509245 3111959749 18308894936 2103249842 2126412342 Kb_wrtn 1876147460 1978282924 1875515764 1875758716 626802460 14666516 19983468 0 35278292 0 10386688 82268236 169601848 1978387280 1978677528 7496668 647254280 31384956 795383552 199399144 80385252 720917972 204348212 98946776 161247332 2807296084 107262760 56985180 1875221640 1977828244 50

Copyright IBM Corporation 2012

iostat D cumulative since last boot; mapping&comparing hdisks stats is useful in characterizing performance-related I/O patterns&trends

$ iostat -D
System configuration: lcpu=24 drives=87 paths=172 vdisks=0 hdisk0 xfer: read: write: queue: hdisk1 xfer: read: write: queue: hdisk86 %tm_act 0.8 rps 0.6 wps 1.7 avgtime 8.8 %tm_act 0.6 rps 0.0 wps 1.7 avgtime 11.3 %tm_act 10.2 rps 30.6 wps 2.5 avgtime 4.3 %tm_act 10.1 rps 31.2 wps 2.5 avgtime 4.3 bps 18.7K avgserv 3.0 avgserv 5.5 mintime 0.0 bps 12.9K avgserv 4.8 avgserv 5.4 mintime 0.0 bps 789.3K avgserv 6.5 avgserv 2.5 mintime 0.0 bps 801.6K avgserv 6.3 avgserv 2.5 mintime 0.0 tps 2.3 minserv 0.1 minserv 0.3 maxtime 291.3 tps 1.7 minserv 0.1 minserv 0.4 maxtime 275.6 tps 33.1 minserv 0.1 minserv 0.2 maxtime 1.1S tps 33.7 minserv 0.1 minserv 0.2 maxtime 1.2S bread 7.0K maxserv 267.1 maxserv 320.5 avgwqsz 0.0 bread 1.2K maxserv 301.8 maxserv 281.1 avgwqsz 0.0 bread 753.9K maxserv 1.3S maxserv 912.0 avgwqsz 0.0 bread 764.2K maxserv 1.2S maxserv 913.1 avgwqsz 0.0 bwrtn 11.7K timeouts 0 timeouts 0 avgsqsz 0.0 bwrtn 11.7K timeouts 0 timeouts 0 avgsqsz 0.0

fails 0 fails 0 sqfull 6349911

fails 0 fails 0 sqfull 6102418

xfer: read: write: queue:

hdisk87

xfer: read: write: queue:

bwrtn 35.4K timeouts fails 0 0 timeouts fails 0 0 avgsqsz sqfull 0.0 73320194 bwrtn 37.4K timeouts fails 0 0 timeouts fails 0 0 avgsqsz sqfull 0.0 74160810

Copyright IBM Corporation 2012

51

netstat v high watermark is Max Packets on S/W Transmit Queue

$ netstat -v
------------------------------------------------------------ETHERNET STATISTICS (ent0) : Device Type: 2-Port 10/100/1000 Base-TX PCI-Express Adapter (14104003) Hardware Address: 00:14:5e:74:1b:8a Elapsed Time: 270 days 21 hours 33 minutes 15 seconds Transmit Statistics: -------------------Packets: 101419085701 Bytes: 402789006370762 Interrupts: 0 Transmit Errors: 0 Packets Dropped: 0 Max Packets on S/W Transmit Queue: 3109 S/W Transmit Queue Overflow: 0 Current S/W+H/W Transmit Queue Length: 1 Broadcast Packets: 24079 Multicast Packets: 0 No Carrier Sense: 0 DMA Underrun: 0 Lost CTS Errors: 0 Max Collision Errors: 0 Late Collision Errors: 0 Deferred: 0 SQE Test: 0 Timeout Errors: 0 Single Collision Count: 0 Multiple Collision Count: 0 Current HW Transmit Queue Length: 1 General Statistics: ------------------No mbuf Errors: 30 Adapter Reset Count: 0 Adapter Data Rate: 2000 Driver Flags: Up Broadcast Running Simplex 64BitSupport ChecksumOffload PrivateSegment LargeSend DataRateSet Broadcast Packets: 1135765 Multicast Packets: 387934 CRC Errors: 0 DMA Overrun: 9219805 Alignment Errors: 0 No Resource Errors: 1745358 Receive Collision Errors: 0 Packet Too Short Errors: 0 Packet Too Long Errors: 0 Packets Discarded by Adapter: 0 Receiver Start Count: 0 Receive Statistics: ------------------Packets: 417799880725 Bytes: 546174259053849 Interrupts: 67899053842 Receive Errors: 10965163 Packets Dropped: 30 Bad Packets: 0

Copyright IBM Corporation 2012

52

Strategic Thoughts, Concepts, Considerations, and Tactics

Monitoring AIX Usage, Meaning and Interpretation Review component technology of the infrastructure, i.e. proper tuning-by-hardware Review implemented AIX constructs, i.e. firm near-static structures and settings Review historical/accumulated AIX events, i.e. usages, pendings, counts, blocks, etc. Monitor dynamic AIX command behaviors, i.e. ps, vmstat, mpstat, iostat, etc. Recognizing Common Performance-degrading Scenarios High Load Average relative to count-of-LCPUs, i.e. over-threadedness vmstat:memory:avm near-to or greater-than lruable-gbRAM, i.e. over-committed Continuous low vmstat:memory:fre with persistent lrud (fr:sr) activity Continuous high ratio of vmstat:kthr:b relative to vmstat:kthr:r Poor ratio of pages freed to pages examined (fr:sr ratio) in vmstat -s output

Copyright IBM Corporation 2012

53

ps kelmo THREAD demonstrates the reality of threadedness

$ ps kelmo THREAD
USER root root root root root root root root PID PPID 1 0 3145888 1 3539136 1 4915350 1 5111986 1 5373988 5701664 5636338 1 5701664 1 TID ST - A 65539 S - A 4063447 S - A 15270101 S - A 14155953 S - A 4915221 S 11272331 S 13107365 S 14352589 S 14418107 S 14483643 S 14549181 S 14614719 S 14680257 S 14745795 S 14811333 S 14876871 S 14942409 S 15007947 S 15073485 S 15139023 S 15204561 S - A 3997835 S - A 16646167 S - A 15794207 S CP 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 PRI SC WCHAN F TT BND COMMAND 60 1 200003 - /etc/init 60 1 410400 - 60 1 f1000000a05f9098 240001 - /usr/ccs/bin/shlap64 60 1 f1000000a05f9098 400 - 60 1 f1000a0000154048 40401 - /usr/lib/errdemon 60 1 f1000a0000154048 10400 - 60 1 f1000000c0386728 40401 - /usr/sbin/emcp_xcryptd -d 60 1 f1000000c0386728 400000 - 60 17 * 240001 - /usr/sbin/syncd 60 60 1 f1000a011d0ec6b0 410400 - 60 1 f1000a0122cb77b0 410400 - 60 1 f1000a0122cbd2b0 410400 - 60 1 f1000a013a0fe0b0 410400 - 60 1 f1000a013a0fedb0 410400 - 60 1 f1000a011fcf0ab0 410400 - 60 1 f1000a0122cb2fb0 410400 - 60 1 f1000a01200d8fb0 410400 - 60 1 f1000a0100a3fab0 410400 - 60 1 f1000a0122cba3b0 410400 - 60 1 f100010082565fb0 410400 - 60 1 f1000a0100a387b0 410400 - 60 1 f1000a013a0f08b0 410400 - 60 1 f1000a0122cb89b0 410400 - 60 1 f1000a013a0f83b0 410400 - 60 1 f1000a013a0fb2b0 410400 - 60 1 f1000a01200d7eb0 410400 - 60 1 f1000e00047f48c8 240001 - /opt/freeware/cimom/pegasus/bin/cimssys pla 60 1 f1000e00047f48c8 410400 - 60 1 f1000a0100a3c6b0 240001 - /opt/ibm/director/cimom/bin/tier1slp 60 1 f1000a0100a3c6b0 410400 - 60 1 240001 - /usr/sbin/srcmstr 60 1 18400 - -

Copyright IBM Corporation 2012

54

ksh commands illustrate the count of processes-and-comprising-threads of user and kernel processes for a comparative scale of threadedness

ps -el | wc -l ps -elmo THREAD | wc -l ps -kl | wc -l ps -klmo THREAD | wc -l 101 3413 401 965 # 101 user procs (one line of column header) # 3413 101 = 3312 threads(user) # 401 kernel procs (one line of column header) # 965 401 = 564 threads(kernel) # 3312 + 564 = 3876 total threads

Copyright IBM Corporation 2012

55

ps guww descending by %CPU, %MEM, SZ, RSS, STIME, TIME, full command-line syntax; a useful hunter-seeker of run-away processes (ordered by recent realtime CPU-intensity)

$ ps guww
USER root db2prd1 db2lib1 root root root root root root root root root root root root root root root root root db2lib1 root root root root root root root root root db2prd1 PID %CPU %MEM 33882274 16646192 29622334 7930108 1048608 983070 1572912 1507374 1310760 1245222 917532 1441836 1835064 1769526 1179684 1703988 2097216 2031678 1966140 2293830 26345652 2359368 2228292 2162754 1638450 1114146 1376298 131076 1900602 458766 41353288 1.8 1.4 1.2 1.2 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.1 0.1 0.1 0.1 0.1 0.1 0.0 0.0 SZ RSS TTY STAT - A - A - A - A - A - A - A - A - A - A - A - A - A - A - A - A - A - A - A - A - A - A - A - A - A - A - A - A - A - A STIME 00:12:28 TIME COMMAND 2:03 aex-pluginmanager -F -nm -nc db2prd1 22282314 3.2 16.0 4883268 3825088 - A 0.0 4812 4880 2.0 838412 483700 0.0 3508 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 448 448 448 448 448 448 448 448 448 448 448 448 448 448 448 448 448 448 448 448 448 448 448 448 136 384 384 384 384 384 384 384 384 384 384 384 384 384 384 384 384 384 384 384 384 384 384 384 384 512 6.0 1698612 1504896 - A Sep 07 20055:07 /opt/tivoli/tsm/server/bin/dsmserv -u db2prd1 -i /tsm/db2p1/config Sep 07 8785:29 db2sysc 0 Sep 07 7729:52 /opt/tivoli/tsm/server/bin/dsmserv -u db2lib1 -i /tsm/db2lib1/config Sep 07 7772:18 /usr/sbin/emcp_mond Sep 07 2031:40 wait Sep 07 2027:38 wait Sep 07 2011:26 wait Sep 07 2006:49 wait Sep 07 1950:43 wait Sep 07 1949:03 wait Sep 07 1943:26 wait Sep 07 1882:46 wait Sep 07 1854:11 wait Sep 07 1849:40 wait Sep 07 1826:15 wait Sep 07 1713:06 wait Sep 07 1559:05 wait Sep 07 1553:45 wait Sep 07 1413:30 wait Sep 07 1335:04 wait Sep 07 1260:41 db2sysc 0 Sep 07 1229:51 wait Sep 07 1201:19 wait Sep 07 659:09 wait Sep 07 616:09 wait Sep 07 612:59 wait Sep 07 586:16 wait Sep 07 553:58 wait Sep 07 529:08 wait Sep 07 265:03 vmmd 00:01:46 0:09 db2vend (db2med - 338866 (TSMDB1))

1.0 383772 214104

0.0 1216

0.0 5204 5228

Copyright IBM Corporation 2012

56

ps gvww ascending by PID, TIME, PGIN, SIZE, RSS, %CPU, %MEM, full command-line syntax
PGIN (v flag) The number of disk I/Os resulting from references by the process to pages not loaded in core

# ps gvww
PID TTY 0 1 131076 196614 262152 STAT - A - A - A - A - A TIME PGIN SIZE 13:55 7 384 0:17 2125 712 553:58 0 448 0:15 9 448 25:39 0 704 RSS LIM TSIZ 320 xx 0 180 32768 30 384 xx 0 384 xx 0 512 xx 0 TRS %CPU %MEM COMMAND 0 0.0 0.0 swapper 24 0.0 0.0 /etc/init 0 0.1 0.0 wait 0 0.0 0.0 sched 0 0.0 0.0 lrud

6357190 - A 0:00 168 4996 752 32768 14 24 0.0 0.0 ./slp_srvreg -D 6488276 - A 0:00 392 5496 768 32768 40 36 0.0 0.0 /usr/sbin/syslogd 6553668 - A 0:00 695 8684 348 32768 34 0 0.0 0.0 /usr/bin/cimlistener 6881508 - A 0:31 165 3556 304 32768 239 76 0.0 0.0 /usr/sbin/xntpd 6947028 - A 0:00 557 984 192 32768 41 24 0.0 0.0 /usr/sbin/portmap 7012606 - A 0:02 2936 30112 1320 32768 32 16 0.0 0.0 [cimserve] 7077976 - A 0:06 398 7232 1384 32768 4821 524 0.0 0.0 ./dsmc schedule 7143480 - A 2:58 22283 9712 852 xx 555 252 0.0 0.0 /usr/sbin/rsct/bin/rmcd -a IBM.LPCommands -r 7405714 - A 3:05 5646 43444 3488 32768 69 0 0.0 0.0 /usr/java5/bin/java -Xmx512m -Xms20m -Xscmx10m -Xshareclasses Dfile.encoding=UTF-8 -Xbootclasspath/a:/pconsole/lwi/runtime/core/eclipse/plugins/com.ibm.rcp.base_6.2.1.200911171800/rcpbootcp.jar:/pconsole/lwi/lib/ISCJaasModule.jar:/pconsole/lwi/lib/com.ibm.logging.icl_1.1.1.jar:/pconsole/lwi/lib/jaas2zos.jar:/pcon sole/lwi/lib/jaasmodule.jar:/pconsole/lwi/lib/lwinative.jar:/pconsole/lwi/lib/lwinl.jar:/pconsole/lwi/lib/lwirolemap.jar:/pconsole/lwi/lib/ lwisecurity.jar:/pconsole/lwi/lib/lwitools.jar:/pconsole/lwi/lib/passutils.jar -Xverify:none -cp eclipse/launch.jar:eclipse/startup.jar:/pconsole/lwi/runtime/core/eclipse/plugins/com.ibm.rcp.base_6.2.1.20091117-1800/launcher.jar com.ibm.lwi.LaunchLWI 7471340 - A 0:00 1 320 20 32768 15 0 0.0 0.0 /usr/sbin/writesrv 7536892 - A 0:00 489 3788 668 32768 507 396 0.0 0.0 /usr/sbin/sshd 7602248 - A 0:11 6073 15100 4376 32768 12 0 0.0 0.0 db2fmp (C) 0 7667954 - A 0:00 1 112 20 32768 5 0 0.0 0.0 /usr/sbin/uprintfd 7733494 - A 2:44 10320 14084 3992 32768 66 24 0.0 0.0 /opt/CMAgent/CFC/3.0/bin/CsiAgentListener -r /opt/CMAgent/ECMu/1.0 -u csi_acct 22216888 - A 0:00 0 448 448 32768 0 0 0.0 0.0 aioserver 22282314 - A 20055:07 1849352 4840612 3825088 32768 42657 8524 3.2 16.0 /opt/tivoli/tsm/server/bin/dsmserv -u ltsmprd1 -i /tsm/ltsmp1/config -q 22937720 - A 5:17 42156 22784 10860 32768 12 0 0.0 0.0 db2acd 0 23396486 - A 0:00 1 448 384 32768 0 0 0.0 0.0 aioserver 23920790 - A 0:00 0 1516 368 32768 59 44 0.0 0.0 db2ckpwd 0

Copyright IBM Corporation 2012

57

vmstat Iwt 2 establish dynamic baselines of AIX behaviors

$ vmstat Iwt 2 System configuration: lcpu=8 mem=15808MB kthr memory page faults cpu time ----------- --------------------- ------------------------------------ ------------------ ----------- -------r b p avm fre fi fo pi po fr sr in sy cs us sy id wa hr mi se 6 9 0 3622390 1865 1171 637 57 680 2228 5066 896 68533 7908 89 7 1 3 09:17:40 6 8 0 3627696 1764 847 3 57 808 3489 8207 812 48745 10969 73 6 6 15 09:17:42 2 13 0 3631998 2875 231 343 52 798 2903 6644 777 14471 2850 64 4 14 19 09:17:44 10 9 0 3636139 2053 994 880 129 527 2904 16706 1013 55414 7332 91 6 1 3 09:17:46 10 5 0 3619116 18481 1381 629 145 510 1228 19741 1098 53549 10147 91 7 1 2 09:17:48 11 2 0 3609866 21981 2570 893 244 0 0 0 1537 53911 13926 91 8 0 1 09:17:50 8 5 0 3610726 15307 2522 718 353 0 0 0 1522 49902 9863 89 7 1 3 09:17:52 6 3 0 3595648 28527 588 474 316 0 0 0 1006 100617 5395 84 6 4 5 09:17:54 10 5 0 3595806 25633 1101 564 273 0 0 0 1113 109611 7128 88 8 1 2 09:17:56 6 11 0 3601571 16601 1191 255 423 0 0 0 1216 140583 9472 87 8 1 4 09:17:58 8 12 0 3604703 11245 661 247 427 0 0 0 1041 118076 10307 89 8 0 3 09:18:00 6 14 0 3600579 12444 1035 293 424 0 0 0 1315 67304 18072 87 9 0 4 09:18:02 7 15 0 3600064 9638 1008 268 395 0 0 0 1292 66735 15921 82 9 1 9 09:18:04 6 12 0 3602133 4839 776 295 464 0 0 0 1050 103319 4426 80 6 2 12 09:18:06 5 13 0 3605240 2170 1025 266 307 279 1556 13844 1042 66953 3916 65 5 8 22 09:18:08 8 6 0 3606415 1945 1975 297 317 0 2752 37581 1440 70972 4870 87 7 1 5 09:18:10 10 4 0 3610938 2084 1366 164 234 0 3943 55882 1241 75037 8307 92 6 0 2 09:18:12 9 2 0 3594623 19789 1321 512 246 41 2271 19132 1343 70210 8794 90 7 1 2 09:18:15 8 4 0 3593551 18060 1188 1890 123 0 0 0 1491 58462 7443 76 8 6 10 09:18:17 5 8 0 3598228 9838 1499 5765 226 0 0 0 1502 39586 10136 73 11 6 10 09:18:19 kthr memory page faults cpu time ----------- --------------------- ------------------------------------ ------------------ ----------- -------r b p avm fre fi fo pi po fr sr in sy cs us sy id wa hr mi se 7 2 0 3601032 4779 969 2464 103 0 0 0 1099 35643 12135 86 7 3 4 09:18:21 8 5 0 3604401 2167 1282 328 293 0 2070 10143 1168 54493 17465 89 10 0 1 09:18:23 10 5 0 3611569 2022 1012 38 282 9 4822 19771 1112 47936 11135 89 8 1 1 09:18:25 8 8 0 3614396 2000 741 7 339 56 2487 11217 1040 40025 8102 91 6 1 2 09:18:27 6 6 0 3617421 2014 404 8 267 303 2191 7130 733 98557 7510 77 5 7 11 09:18:29 8 8 0 3619413 2974 252 90 169 519 1899 5209 704 87714 3431 78 4 4 14 09:18:31 6 7 0 3620661 1965 488 201 293 255 922 2729 1094 39403 4851 89 5 2 4 09:18:33 7 7 0 3623343 2617 684 72 109 610 2468 8924 937 15606 16827 69 7 5 19 09:18:35 6 8 0 3624146 2831 443 7 272 332 1228 4517 739 15674 3277 82 4 4 10 09:18:37 6 7 0 3625514 1934 464 7 336 168 1048 3851 758 17909 4927 79 4 4 14 09:18:39 4 8 0 3608643 19107 618 29 393 115 1171 3803 889 28193 5547 73 5 4 18 09:18:41 6 8 0 3596033 29222 815 8 429 0 0 0 977 62182 10416 73 7 5 16 09:18:43 6 5 0 3598914 22745 1233 246 369 0 0 0 1116 77652 5335 74 7 7 12 09:18:45 5 3 0 3604955 12994 1151 372 375 0 0 0 1085 83391 6533 79 6 5 11 09:18:47

Copyright IBM Corporation 2012

58

vmstat I 2 Best 6-in-1 monitor; no-load leave-it-up all-day VMM monitor

$ vmstat I 2
System configuration: lcpu=24 mem=73728MB ent=12.00 kthr -------r b p 10 6 0 10 7 0 9 7 0 8 6 0 6 6 0 5 7 0 8 7 0 9 7 0 8 7 0 10 13 0 7 9 0 10 7 0 4 7 0 9 8 0 10 8 0 8 7 0 6 7 0 8 9 0 9 9 0 8 11 0 memory page faults cpu ----------- ------------------------ ------------ ----------------------avm fre fi fo pi po fr sr in sy cs us sy id wa pc ec 11507787 49543 3483 823 0 0 5306 5355 5662 45907 26258 33 4 53 10 4.71 39.3 11509796 47989 3824 594 0 0 4116 27771 6252 56592 45959 54 7 28 12 7.65 63.8 11510010 47770 3955 622 0 0 3977 9647 5907 56222 46833 48 7 31 15 6.87 57.2 11510560 50021 4164 2431 0 0 5564 40421 6607 51080 49691 41 7 39 13 6.08 50.7 11512741 46710 4886 1443 0 0 4443 4608 6110 42400 30394 36 5 42 17 5.28 44.0 11514081 48807 4675 227 0 0 6461 7028 4838 34521 11343 33 3 55 9 4.68 39.0 11515469 48531 5679 482 0 0 6445 6593 5686 44979 13230 37 3 48 12 4.99 41.5 11514065 49598 3858 1046 0 0 4128 4255 5807 51871 27521 44 6 32 19 6.27 52.3 11517672 49905 4848 632 0 0 7173 7221 5679 44566 47102 48 6 32 14 6.84 57.0 11520210 50148 4669 692 0 0 6313 6491 6341 47122 45622 52 5 28 15 7.22 60.2 11521192 48222 5087 814 0 0 5194 5790 6211 49553 44306 45 6 34 15 6.45 53.7 11521212 50922 3830 627 0 0 5330 5353 6248 48130 32364 47 4 37 12 6.42 53.5 11521503 49362 3475 573 0 0 3075 3102 5717 47907 13356 42 3 41 14 5.69 47.4 11523055 48731 3502 511 0 0 4143 4176 5884 44391 13427 46 2 41 11 6.01 50.1 11524140 50987 3483 761 0 0 5363 5683 5830 45416 15252 60 3 23 14 7.89 65.7 11524407 45661 3871 351 0 0 1488 1621 5378 34403 13034 54 2 29 15 7.14 59.5 11523652 50033 3325 355 0 0 5229 5448 5434 40780 14372 45 3 36 16 6.06 50.5 11525268 48536 4209 272 0 0 4102 4337 4599 36202 10449 44 4 35 18 6.05 50.4 11525476 48242 4322 521 0 0 4307 4634 5375 33863 13975 44 3 35 18 5.97 49.7 11526444 49830 4988 699 0 0 6351 6828 6743 53110 45620 46 6 32 16 6.63 55.3

Copyright IBM Corporation 2012

59

vmstat Iwt 2 check for memory over-commitment; MAX(avm)*4096; load average=AVG(vmstat:kthr:r) over 60secs: current,5mins,15mins ago

$ uptime ; vmstat -Iwt 2 20


10:51AM up 133 days, 13:43, 3 users, load average: 19.45, 19.53, 19.32 System configuration: lcpu=24 mem=73728MB ent=12.00
kthr memory page faults cpu ----------- --------------------- ------------------------------------ ------------------ ----------------------r b p avm fre fi fo pi po fr sr in sy cs us sy id wa pc ec 20 0 0 10957853 48491 66 47 0 0 0 0 363 117648 2422 96 2 2 0 12.00 100.0 20 0 0 10957452 48808 16 81 0 0 0 0 418 94030 4048 97 2 2 0 12.00 100.0 18 0 0 10957456 48524 111 136 0 0 0 0 486 6692 2338 98 0 2 0 12.00 100.0 18 0 0 10957463 48455 25 34 0 0 0 0 152 11637 1348 97 1 2 0 12.00 100.0 18 0 0 10957464 48432 6 14 0 0 0 0 77 3019 1141 98 0 2 0 12.00 100.0 19 1 0 10957470 48298 65 7 0 0 0 0 197 4164 1330 97 0 2 0 11.95 99.6 18 0 0 10957472 48296 0 5 0 0 0 0 39 2842 1028 97 0 3 0 11.86 98.8 19 0 0 10957479 48236 23 13 0 0 0 0 234 5335 1448 98 0 2 0 12.00 100.0 19 1 0 10957487 47686 271 5 0 0 0 0 402 13439 1806 97 1 2 0 12.00 100.0 19 0 0 10957489 47684 0 9 0 0 0 0 37 7145 997 97 1 2 0 12.00 100.0 20 0 0 10957481 47610 39 56 0 0 0 0 167 2837 1061 98 0 2 0 12.00 100.0 19 0 0 10957483 47548 31 1 0 0 0 0 85 3075 1065 98 0 2 0 12.00 100.0 18 0 0 10957481 47500 26 13 0 0 7 135 75 2921 1032 98 0 2 0 12.00 100.0 19 0 0 10957889 49033 53 6 0 0 1025 1031 129 88541 1871 96 2 2 0 11.98 99.8 19 0 0 10957888 48954 40 0 0 0 0 0 89 94550 1869 96 2 2 0 12.00 100.0 20 1 0 10957882 48926 17 6 0 0 0 0 74 123666 2068 96 2 2 0 12.00 100.0 19 0 0 10957880 48916 5 8 0 0 0 0 47 120104 1913 94 3 4 0 11.80 98.4 20 0 0 10957666 49062 34 1 0 0 0 0 80 117384 1849 96 2 2 0 12.00 100.0 18 0 0 10957883 48841 1 7 0 0 0 0 59 130003 1924 95 3 2 0 12.00 100.0 20 0 0 10957889 48779 28 6 0 0 0 0 143 126580 2284 96 3 2 0 12.00 100.0 time -------hr mi se 10:51:26 10:51:28 10:51:30 10:51:32 10:51:34 10:51:36 10:51:38 10:51:40 10:51:42 10:51:44 10:51:46 10:51:48 10:51:50 10:51:52 10:51:54 10:51:56 10:51:58 10:52:00 10:52:02 10:52:04

$ bc 10957889 * 4096 44883513344 quit $

# MAX(avm)*4096 relative-to mem=73728MB 73728*1024*1024 # 44883513344 relative-to 77309411328

Copyright IBM Corporation 2012

60

mpstat w 2 Observe the granularity and distribution of active threads across logical-CPUs For SPLPARs, observe the folding/unfolding of virtual-CPUs by monitoring the int column

$ uptime ; mpstat -w 2
03:27PM up 45 days, 7:04, 6 users, load average: 4.22, 4.63, 4.80 System configuration: lcpu=20 ent=8.0 mode=Uncapped cpu min maj mpc int cs ics rq mig lpa sysc us sy wt id pc %ec lcs 0 748 59 0 3928 5037 2148 0 1528 100.0 14044 40.2 56.8 2.9 0.1 0.56 6.9 2310 1 0 0 0 600 0 0 0 0 100.0 0 0.0 47.9 0.0 52.1 0.01 0.2 591 2 531 33 0 1068 4273 1781 0 1105 100.0 11986 56.8 40.3 2.8 0.1 0.39 4.9 1545 3 0 0 0 458 2 1 0 0 100.0 5 1.1 51.6 0.0 47.3 0.01 0.1 454 4 844 27 0 821 2952 1251 0 869 100.0 9545 45.6 52.7 1.7 0.1 0.48 5.9 1118 5 0 0 0 456 0 0 0 0 100.0 5 0.6 55.2 0.0 44.2 0.01 0.1 450 6 880 19 0 797 2256 943 0 679 100.0 6508 26.4 72.6 1.0 0.1 0.61 7.6 820 7 0 0 0 445 0 0 0 0 0 0.0 55.3 0.0 44.7 0.01 0.1 444 8 751 34 0 853 3290 1393 0 898 100.0 9742 51.3 47.0 1.6 0.1 0.54 6.7 1201 9 0 0 0 457 0 0 0 0 0 0.0 53.7 0.0 46.3 0.01 0.1 455 10 424 15 0 613 2367 1018 0 599 100.0 6781 51.4 46.0 2.3 0.3 0.27 3.4 939 11 0 0 0 227 0 0 0 0 0 0.0 51.7 0.0 48.3 0.01 0.1 224 12 128 0 0 186 112 52 0 19 100.0 451 17.8 80.1 0.4 1.8 0.07 0.8 157 13 0 0 0 162 0 0 0 0 0 0.0 47.1 0.0 52.9 0.00 0.0 161 14 0 0 0 100 0 0 0 0 0 0.0 33.8 0.0 66.2 0.00 0.0 100 15 0 0 0 10 0 0 0 0 0 0.0 14.5 0.0 85.5 0.00 0.0 10 16 0 0 0 10 0 0 0 0 0 0.0 30.3 0.0 69.7 0.00 0.0 10 17 0 0 0 10 0 0 0 0 0 0.0 41.8 0.0 58.2 0.00 0.0 10 18 0 0 0 10 0 0 0 0 0 0.0 28.7 0.0 71.3 0.00 0.0 10 19 0 0 0 10 0 0 0 0 0 0.0 30.6 0.0 69.4 0.00 0.0 10 U 9.7 53.3 5.04 63.0 ALL 4306 187 0 11221 20289 8587 0 5697 100.0 59067 15.6 20.3 10.4 53.7 2.97 37.1 11019 -----------------------------------------------------------------------------------------------------------------0 1461 64 57 4857 4713 1995 0 1441 100.0 17385 40.3 57.0 2.6 0.1 0.54 6.7 2102 1 207 17 3 870 1987 1088 0 331 100.0 10099 67.9 27.4 3.2 1.5 0.18 2.3 979 2 1688 23 3 1088 2914 1196 0 814 100.0 28374 43.4 55.2 1.2 0.1 0.53 6.6 1002 3 230 3 3 645 1456 742 0 156 100.0 8530 66.6 29.0 2.8 1.7 0.15 1.9 862 4 1934 14 3 945 2422 963 0 645 100.0 30907 65.7 33.3 0.8 0.2 0.56 7.0 812 5 164 4 3 654 1279 581 0 158 100.0 3606 67.6 29.4 1.5 1.5 0.19 2.4 825 6 883 20 3 907 3075 1215 0 899 100.0 15866 62.1 36.7 1.1 0.1 0.58 7.2 960 7 265 29 3 655 1691 790 0 224 100.0 4038 39.8 55.6 3.2 1.3 0.15 1.9 958 8 656 11 3 465 1375 615 0 287 100.0 4563 31.2 66.1 1.8 0.9 0.19 2.4 653 9 0 0 3 284 382 189 0 13 100.0 1652 43.6 39.3 0.8 16.3 0.02 0.3 436 10 300 3 3 328 618 289 0 119 100.0 7818 46.9 51.3 0.8 1.1 0.15 1.9 388 11 1 0 3 162 190 93 0 9 100.0 396 20.6 56.4 13.8 9.2 0.01 0.1 222 12 313 4 3 310 661 310 0 103 100.0 7726 54.2 42.8 1.6 1.4 0.11 1.4 426 13 29 0 3 185 216 104 0 16 100.0 548 9.2 85.6 2.3 2.9 0.04 0.5 248

Copyright IBM Corporation 2012

61

iostat aT 2 Observe real-time trends in Disks, % tm_act, Kbps, tps, Kb_read, Kb_wrtn Map high-traffic hdisks through the stack: hdisk->LVM VG:LV:JFS2-mountpoint w/options

$ uptime ; iostat aT 2 | grep -v 0


03:32PM up 45 days, 7:09, 6 users,

0.0

0.0

0.0

load average: 6.10, 5.11, 4.93

System configuration: lcpu=20 drives=9 ent=8.00 paths=16 vdisks=2 tapes=0 tty: tin 0.0 tout 76656.5 avg-cpu: % user % sys % idle % iowait physc % entc time 21.6 19.9 45.9 12.6 3.4 42.9 15:32:04 tps 3227.0 bkread 1845.0 bkwrtn partition-id time 1382.0 0 15:32:04 Kb_wrtn time 1104 15:32:04 0 15:32:04 200 15:32:04 0 15:32:04 664 15:32:04 9008 15:32:04 608 15:32:04 4692 15:32:04

Vadapter: vscsi0 Disks: hdisk4 hdisk1 hdisk3 hdisk0 hdisk5 hdisk6 hdisk2 hdisk7 tty: tin 0.0

Kbps 22406.0 % tm_act 99.5 0.0 99.0 0.0 70.5 99.0 85.5 94.0 tout 41467.1

Kbps 4310.0 0.0 1892.0 0.0 2736.0 7110.0 1788.0 4570.0

tps 549.5 0.0 234.5 0.0 332.0 1359.0 196.5 555.5

Kb_read 7516 0 3584 0 4808 5212 2968 4448

avg-cpu: % user % sys % idle % iowait physc % entc time 22.3 19.4 47.7 10.5 3.4 42.8 15:32:06 tps 3398.5 bkread 1886.4 bkwrtn partition-id time 1512.1 0 15:32:06 Kb_wrtn time 3448 15:32:06 0 15:32:06 152 15:32:06 0 15:32:06 900 15:32:06 12368 15:32:06 144 15:32:06 5840 15:32:06

Vadapter: vscsi0 Disks: hdisk4 hdisk1 hdisk3 hdisk0 hdisk5 hdisk6 hdisk2 hdisk7 tty: tin 0.0

Kbps 25561.2 % tm_act 99.6 0.0 77.6 0.0 92.6 99.1 74.1 89.6 tout 41436.5

Kbps 5854.4 0.0 1313.0 0.0 3224.4 8884.7 1333.0 4951.7

tps 666.5 0.0 163.6 0.0 403.3 1545.7 165.1 454.3

Kb_read 8252 0 2472 0 5544 5388 2520 4056

avg-cpu: % user % sys % idle % iowait physc % entc time 16.9 19.0 55.0 9.0 3.0 37.1 15:32:08 tps 3732.5 bkread 1529.0 bkwrtn partition-id time 2203.5 0 15:32:08

Vadapter: vscsi0

Kbps 26822.0

Copyright IBM Corporation 2012

62

Strategic Thoughts, Concepts, Considerations, and Tactics

Monitoring AIX Usage, Meaning and Interpretation Review component technology of the infrastructure, i.e. proper tuning-by-hardware Review implemented AIX constructs, i.e. firm near-static structures and settings Review historical/accumulated AIX events, i.e. usages, pendings, counts, blocks, etc. Monitor dynamic AIX command behaviors, i.e. ps, vmstat, mpstat, iostat, etc. Recognizing Common Performance-degrading Scenarios [Part II] High Load Average relative to count-of-LCPUs, i.e. over-threadedness vmstat:memory:avm near-to or greater-than lruable-gbRAM, i.e. over-committed Continuous low vmstat:memory:fre with persistent lrud (fr:sr) activity Continuous high ratio of vmstat:kthr:b relative to vmstat:kthr:r Poor ratio of pages freed to pages examined (fr:sr ratio) in vmstat -s output

Copyright IBM Corporation 2012

63

Session Evaluations

ibmtechu.com/vp

Prizes will be drawn from Evals

Copyright IBM Corporation 2012

64

Copyright IBM Corporation 2012

65

2011
IBM Power Systems Technical University October 10-14 | Fontainebleau Miami Beach | Miami, FL

Thank you

Earl Jew (earlj@us.ibm.com) 310-251-2907 cell Senior IT Management Consultant - IBM Power Systems and IBM Systems Storage IBM Lab Services and Training - US Power Systems (group/dept) 400 North Brand Blvd., c/o IBM 8th floor, Glendale, CA 91203

Copyright IBM Corporation 2012 Materials may not be reproduced in whole or in part without the prior written permission of IBM.

5.3

Trademarks
The following are trademarks of the International Business Machines Corporation in the United States, other countries, or both.
Not all common law marks used by IBM are listed on this page. Failure of a mark to appear does not mean that IBM does not use the mark nor does it mean that the product is not actively marketed or is not significant within its relevant market. Those trademarks followed by are registered trademarks of IBM in the United States; all others are trademarks or common law marks of IBM in the United States.

For a complete list of IBM Trademarks, see www.ibm.com/legal/copytrade.shtml:


*, AS/400, e business(logo), DBE, ESCO, eServer, FICON, IBM, IBM (logo), iSeries, MVS, OS/390, pSeries, RS/6000, S/30, VM/ESA, VSE/ESA, WebSphere, xSeries, z/OS, zSeries, z/VM, System i, System i5, System p, System p5, System x, System z, System z9, BladeCenter

The following are trademarks or registered trademarks of other companies.


Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license therefrom. Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. UNIX is a registered trademark of The Open Group in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. ITIL is a registered trademark, and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office. IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency, which is now part of the Office of Government Commerce.
* All other products may be trademarks or registered trademarks of their respective companies. Notes: Performance is in Internal Throughput Rate (ITR) ratio based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput improvements equivalent to the performance ratios stated here. IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply. All customer examples cited or described in this presentation are presented as illustrations of the manner in which some customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics will vary depending on individual customer configurations and conditions. This publication was produced in the United States. IBM may not offer the products, services or features discussed in this document in other countries, and the information may be subject to change without notice. Consult your local IBM business contact for information on the product or services available in your area. All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only. Information about non-IBM products is obtained from the manufacturers of those products or their published announcements. IBM has not tested those products and cannot confirm the performance, compatibility, or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. Prices subject to change without notice. Contact your IBM representative or Business Partner for the most current pricing in your geography.

67
Copyright IBM Corporation 2012

4-Apr-13 67

Disclaimers
No part of this document may be reproduced or transmitted in any form without written permission from IBM Corporation. Product data has been reviewed for accuracy as of the date of initial publication. Product data is subject to change without notice. This information could include technical inaccuracies or typographical errors. IBM may make improvements and/or changes in the product(s) and/or program(s) at any time without notice. Any statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only. The performance data contained herein was obtained in a controlled, isolated environment. Actual results that may be obtained in other operating environments may vary significantly. While IBM has reviewed each item for accuracy in a specific situation, there is no guarantee that the same or similar results will be obtained elsewhere. Customer experiences described herein are based upon information and opinions provided by the customer. The same results may not be obtained by every user. Reference in this document to IBM products, programs, or services does not imply that IBM intends to make such products, programs or services available in all countries in which IBM operates or does business. Any reference to an IBM Program Product in this document is not intended to state or imply that only that program product may be used. Any functionally equivalent program, that does not infringe IBM's intellectual property rights, may be used instead. It is the user's responsibility to evaluate and verify the operation on any non-IBM product, program or service. THE INFORMATION PROVIDED IN THIS DOCUMENT IS DISTRIBUTED "AS IS" WITHOUT ANY WARRANTY, EITHER EXPRESS OR IMPLIED. IBM EXPRESSLY DISCLAIMS ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR INFRINGEMENT. IBM shall have no responsibility to update this information. IBM products are warranted according to the terms and conditions of the agreements (e.g. IBM Customer Agreement, Statement of Limited Warranty, International Program License Agreement, etc.) under which they are provided. IBM is not responsible for the performance or interoperability of any non-IBM products discussed herein.

Copyright IBM Corporation 2012

68

Disclaimers (Continued)
Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products in connection with this publication and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. The providing of the information contained herein is not intended to, and does not, grant any right or license under any IBM patents or copyrights. Inquiries regarding patent or copyright licenses should be made, in writing, to: IBM Director of Licensing IBM Corporation North Castle Drive Armonk, NY 10504-1785 USA IBM customers are responsible for ensuring their own compliance with legal requirements. It is the customer's sole responsibility to obtain advice of competent legal counsel as to the identification and interpretation of any relevant laws and regulatory requirements that may affect the customer's business and any actions the customer may need to take to comply with such laws. IBM does not provide legal advice or represent or warrant that its services or products will ensure that the customer is in compliance with any law. The information contained in this documentation is provided for informational purposes only. While efforts were made to verify the completeness and accuracy of the information provided, it is provided as is without warranty of any kind, express or implied. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this documentation or any other documentation. Nothing contained in this documentation is intended to, nor shall have the effect of, creating any warranties or representations from IBM (or its suppliers or licensors), or altering the terms and conditions of the applicable license agreement governing the use of IBM software.

Copyright IBM Corporation 2012

69

Das könnte Ihnen auch gefallen