Sie sind auf Seite 1von 24

1) File System Space Usage

Check for disk space problems.


# df I
(Checks for inode usage)
Filesystem
512-blocks
Used
/dev/hd4
17301504
5926488
/dev/hd2
10485760
4583816

Free %Used Mounted on


11375016
35% /
5901944
44% /usr

# df k
(Checks for disk space usage in 1K blocks)
Filesystem
1024-blocks
Free %Used
Iused %Iused Mounted
on
/dev/hd4
8650752
5687508
35%
39729
2% /
/dev/hd2
5242880
2950972
44%
35227
3% /usr
# df g
(Checks for disk space usage in GigaByte blocks)
Filesystem
GB blocks
Free %Used
Iused %Iused Mounted
on
/dev/hd4
8.25
5.42
35%
39729
2% /
/dev/hd2
5.00
2.81
44%
35227
3% /usr
# df gP (POSIX view with
Filesystem
GB blocks
/dev/hd4
8.25
/dev/hd2
5.00

different heading names)


Used Available Capacity Mounted on
2.83
5.42
35% /
2.19
2.81
44% /usr

Note that the (df k or -g) lists the disk usage (%Used) as well as the
inodes usage (%Iused).
Be sure to pay close attention and try not to get the two confused when
checking file system space.

Use lsps to check paging/swap space usage:


The lsps command displays the characteristics of paging spaces, such as paging space name,
physical volume name, volume group name, size, percentage of the paging space used, status of
space, and it shows if the paging space is set to automatic.
# lsps a (Note that this system is paging quite a bit)
Page Space
Physical Volume
Volume Group
Size %Used Active
Type
paging00
lv
hd6
lv
hd6
lv

Auto

hdisk0

rootvg

10752MB

45

yes

yes

hdisk1

rootvg

2560MB

45

yes

yes

hdisk2

rootvg

8192MB

45

yes

yes

or
# swap -s
allocated = 5505024 blocks used = 2458677 blocks free = 3046347
blocks

2) Load Average
# uptime
11:14AM
up 10 days,
0.05, 0.03

21:02,

2 users,

load average: 0.05,

*Note: The load average numbers give the average number of jobs/processes in the run queue
over the last 1, 5, and 15 minutes. The lowest possible load average is zero. A load average of
one or two is about typical. The load avg. of 3 and above could indicate a critical issue on the
system.

B. SYSTEM PERFORMANCE
1) CPU and Memory Usage
The vmstat command reports statistics about kernel threads, virtual memory, disks, traps, and
CPU activity.

*us = user time, sy = system time, id = CPU idle time, wa = CPU cycles to determine that the
current process is wait.
# vmstat 5 5
System Configuration: lcpu=8 mem=16384MB
kthr
memory
page
----- ----------- -----------------------r b
avm
fre re pi po fr
sr cy
5 1 4818381 24300
0
2
2 636 859
53 5
6 1 4817085 25591
0
0
0
0
0
23 4
7 1 4811637 31031
0
0
0
0
0
49 8
2 1 4813001 29650
0
0
0
0
0
6
4 1 4818874 23769
0
0
0
0
0
7

faults
cpu
------------ ----------in
sy cs us sy id wa
0 2048 321280 9460 24 18
0 1838 593223 4798 53 21
0 1975 265643 4706 30 13
0 1814 95041 7491

8 10 76

0 1864 53014 4428

7 81

A new I/O oriented view using the I option:


# vmstat -I 5 5
System Configuration: lcpu=8 mem=16384MB
kthr
memory
page
faults
cpu
-------- ----------- ------------------------ ------------ ----------r b p
avm
fre fi fo pi po fr sr
in
sy cs us sy id wa
5 1 0 4809912 45680 574 203
2
2 636 860 2048 321270 9459 24 18 53 5
1 0 0 4820163 35346 12 152
0
0
0
0 2034 410525 5435 10 20 67 2
2 0 0 4816092 39388
4 57
0
0
0
0 1726 566771 62167 13 20 65 2
2 1 0 4821609 33799 11 216
0
0
0
0 2024 529518 21680 13 27 56 4
6 1 0 4815588 39806
1 43
0
0
0
0 1668 481025 4853 12 18 69 1

Iostat reports CPU and I/O statistics.


# iostat (On large systems this output could be quite large)
System configuration: lcpu=2 disk=3
tty:

Disks:
hdisk1
hdisk0
cd0

tin
0.0

tout
0.5
% tm_act
0.6
0.7
0.0

avg-cpu:

Kbps

tps
8.2
7.2
0.0

% user
% sys
0.3
0.2

% idle
99.3

% iowait
0.2

Kb_read
Kb_wrtn
1.2
2030462
5660599
1.1
1116762
5660603ma
0.0
0
0

Note: %user shows the percentage of CPU utilization at the user level and %sys shows the
percentage of the CPU utilization at the system level.

# sar 5 5
AIX jrspa22t 2 5 00283EDD4C00

07/26/06

System Configuration: lcpu=8


10:12:49
10:12:54
10:12:59
10:13:04
10:13:09
10:13:14
Average

%usr
22
53
52
52
39

%sys
13
4
9
3
8

%wio
2
1
1
1
2

%idle
64
42
38
44
52

44

48

To monitor all CPU usage via SAR:


# sar P ALL 5 10
The topas command displays statistics of system activities and CPU usage. This output may be
viewed in intervals of seconds using the i flag. To ensure output is in a readable format, set
your terminal emulation to vt220 prior to accessing the system as well as after logging onto
the system.
# topas -i5
The report from the topas command lists the CPU usage of the kernel, user, wait time, and
system idle time. Below, it also lists processes, along with the PID, CPU usage, and owner
that are currently running on the system.
Process Id,
usage,&owner

nel, user, wait,&idle usage

To monitor the busiest processes on a system using topas:


# topas Pi5 (checks at a 5 second interval)
Topas Monitor for host:
2006

USER
PID
COMMAND
root
258066
syncd
patrol 7778462
PatrolAg
lAg
root
8769704
bgsagent
root
172116
wlmsched
bsomqp022642060
java
root
7958554
topas
root
1417340
ncmsqp042674916
patrol 6553618
PatrolAg
t
ncmsqp023493972

jrspa22t

PPID PRI NI
1

60 20

DATA
RES
88

Interval:

TEXT PAGE
RES SPACE
1

Wed Jul 26 10:15:47

TIME CPU%

PGFAULTS
I/O OTH

160 1966:11

2.9

75 30 14933

674 17910 1423:17

1.1

027708

62 20

6313

835

9:29

1.1

400

16 41

17

17 1090:36

0.5

26:38

0.4

0:01

0.4

202

245
838 1349:10
17 40486
47:57
674 4157 242:53

0.4
0.3
0.2

0
0 seosd
0
0 java
0 1721

0.2

6313

2191530

60 20 15194

11 24748

2969668

58 41

19

1
2822388
1

1 41
400
60 20 25443
70 30 2754

3690670

60 20 17462

2790

2790

11 27864

29:37

25

0 java

Find the top 15 processes using memory on a system:


# svmon -Pt15 | perl -e 'while(<>){print if($.==2||$&&&!$s++);$.=0 if(/^-+
$/)}'
-----------------------------------------------------------------------------Pid Command
Inuse
Pin
Pgsp Virtual 64-bit Mthrd LPage
1589482 oracle
247739
5402
55835
109827
Y
N
N
2039974 oracle
221077
5402
56167
110311
Y
N
N
2129990 oracle
220953
5402
56091
110111
Y
N
N
1982638 oracle
220808
5402
55824
109858
Y
N
N
1396820 oracle
219414
5402
55839
109946
Y
N
N
2670812 oracle
219319
5402
55990
109938
Y
N
N
6779124 oracle
219285
5402
56034
109932
Y
N
N
2216084 oracle
219245
5402
55979
109899
Y
N
N
2912464 oracle
219239
5402
55926
109873
Y
N
N
2470110 oracle
219232
5402
55953
109874
Y
N
N
2572518 oracle
219002
5402
56018
109846
Y
N
N
2584744 oracle
218920
5402
56173
109915
Y
N
N
2211846 oracle
218883
5402
56245
109948
Y
N
N
6979770 oracle
200825
5402
56144
109830
Y
N
N
1790028 java
187476
5727
57630
198578
N
Y
N

Finding the size of a PID using ps:


# ps v 3375240
PID
3375240

TTY STAT TIME PGIN SIZE


RSS
LIM TSIZ
TRS %CPU %MEM COMMAND
- A
42:25 10859 157132 106180
xx
39
44 0.0 1.0 /pac/nc

2) Where to obtain PERFPMR to collect performance


data
If a server has a performance problem, IBM may request that you
install perfpmr and collect performance data during a peak load
period. IBM normally provides instructions on how to install.
You can obtain a copy of the perfpmr scripts from the following
location:
ftp://ftp.software.ibm.com/aix/tools/perftools/perfpmr
You will need to get this while you are logged onto the server
with the problem.
The IBM performance team has suggested the following changes be
made to the script once it is downloaded and installed:
Please change the following lines in each of the stanzas in
perfpmr.cfg:
trace.sh:
logsize = 402653184
kbufsize = 201326592
filemon.sh:
filemon_kbufsize = 201326592
filemon_time_seconds = 60
space_required = 83886080

3) LAN Status
The netstat command shows network status for each protocol or routing table. The i flag
may be used to determine collisions and I/O errors.
# netstat -i
Name
en0
en0
en2
en2
lo0
0
lo0
0
lo0
6076

Mtu
Network
Address
Ipkts
1500
link#2
0.9.6b.3e.57.61
424536
1500
89.10.12
prl28284
424536
1500
link#3
0.9.6b.ce.54.cb
4297312
1500
55.10.32
breac01t-55
4297312
16896
link#1
0
16896
127
loopback
5254
0
16896
::1
0
0

Check routing tables with network addresses

Ierrs
0
0
0
0
5254

Opkts
239376
239376
140332
140332
0

Oerrs
0
0
2
2
6076

0
0
0

6076

0
5254

Coll
0

# netstat -rn
Routing tables
Destination

Gateway

Flags

Refs

Route Tree for Protocol Family 2 (Internet):


default
89.10.12.254
UGc
55.10.32.0
55.10.34.184
UHSb
55.10.32/22
55.10.34.184
U
55.10.34.184
127.0.0.1
UGHS
55.10.35.255
55.10.34.184
UHSb
89.10.5.135
89.10.12.254
UGHW

Use

If

PMTU Exp Groups

0
en0
0
en2
0
138677
en2
0
1
lo0
0
4
en2
1
2163
en0
0

=>

# ifconfig -a
en0:flags=5e080863,80<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPR
T,64BIT,CHECKSUM_OFFLOAD,PSEG,CHAIN> inet 89.10.12.31 netmask 0xffffff00
broadcast 89.10.12.255
en2:flags=5e080863,c0<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPR
T,64BIT,CHECKSUM_OFFLOAD,PSEG,CHAIN>inet 55.10.34.184 netmask 0xfffffc00
broadcast 55.10.35.255
lo0:flags=e08084b<UP,BROADCAST,LOOPBACK,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BI
T> inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255
inet6 ::1/0 tcp_sendspace 65536 tcp_recvspace 65536

4) How to check interface card speed, auto negotiation


info.
# entstat -d ent4|more

------------------------------------------------------------ETHERNET STATISTICS (ent4) :


Device Type: Gigabit Ethernet-SX PCI-X Adapter (14106802)
Hardware Address: 00:02:55:33:77:63
Elapsed Time: 11 days 9 hours 58 minutes 37 seconds
Transmit Statistics:
-------------------Packets: 17299124
Bytes: 486040591195
Interrupts: 0
Transmit Errors: 0
Packets Dropped: 0

Receive Statistics:
------------------Packets: 166277808
Bytes: 38982878854
Interrupts: 153893117
Receive Errors: 0
Packets Dropped: 0
Bad Packets: 0

Max Packets on S/W Transmit Queue: 51


S/W Transmit Queue Overflow: 0
Current S/W+H/W Transmit Queue Length: 0
Broadcast Packets: 60
Multicast Packets: 1
No Carrier Sense: 0
DMA Underrun: 0

Broadcast Packets: 97825101


Multicast Packets: 95415
CRC Errors: 0
DMA Overrun: 0

Lost CTS Errors: 0


Max Collision Errors: 0
Late Collision Errors: 0
Deferred: 0
SQE Test: 0
Timeout Errors: 0
Single Collision Count: 0
Multiple Collision Count: 0
Current HW Transmit Queue Length: 0

Alignment Errors: 0
No Resource Errors: 0
Receive Collision Errors: 0
Packet Too Short Errors: 0
Packet Too Long Errors: 0
Packets Discarded by Adapter: 0
Receiver Start Count: 0

General Statistics:
------------------No mbuf Errors: 0
Adapter Reset Count: 0
Adapter Data Rate: 2000
Driver Flags: Up Broadcast Running
Simplex 64BitSupport ChecksumOffload
PrivateSegment LargeSend DataRateSet
Gigabit Ethernet-SX PCI-X Adapter (14106802) Specific Statistics:
--------------------------------------------------------------------

Link Status : Up
Media Speed Selected: Auto negotiation
Media Speed Running: 1000 Mbps Full Duplex

PCI Mode: PCI-X (100-133)


PCI Bus Width: 64-bit
Latency Timer: 144
Cache Line Size: 128
Jumbo Frames: Disabled
TCP Segmentation Offload: Enabled
TCP Segmentation Offload Packets Transmitted: 14265351
TCP Segmentation Offload Packet Errors: 0
Transmit and Receive Flow Control Status: Enabled
XON Flow Control Packets Transmitted: 0
XON Flow Control Packets Received: 0
XOFF Flow Control Packets Transmitted: 0
XOFF Flow Control Packets Received: 0
Transmit and Receive Flow Control Threshold (High): 45056
Transmit and Receive Flow Control Threshold (Low): 24576
Transmit and Receive Storage Allocation (TX/RX): 16/48

C. LAST REBOOT, RUN LEVEL, BOOT


LOG, CONSOLE LOG
Check to see if the box has rebooted recently by running:
who b

A recent system reboot could explain alarms on the system. The reboot may have been scheduled
or may have been caused by a system panic, hardware failure, or power failure. Further
investigation should be done. Check the CAMCS logs to see if a system panic occurred or check
cron to see if a reboot script was executed.
Check for the systems current run level. Please note that AIX operates at Run Level 2. Other
Run Levels are available, but are rarely used.
who r
To check for any configuration errors after a system reboot, run the following command to
see the bootlog:
# alog o f /var/adm/ras/bootlog | more
The console log can be viewed using this command:
# alog o f /var/adm/ras/conslog | more

D. DISK DRIVE REPLACEMENT


Disk Drive Procedures
The following commands are used to display devices on the system and their characteristics.

1) Hardware Devices
lsdev displays information about devices in the device configuration database.
Flags: -C lists information about a device that is in the Customized Devices object class.
-c specifies a device class name.
-H displays headers above the column output.
To list the disks that are in the Available state in the Customized Devices object class..
# lsdev -CH -c disk
name
status
location
description
hdisk0 Available 1S-08-00-5,0 16 Bit LVD SCSI Disk Drive
hdisk1 Available 1S-08-00-8,0 16 Bit LVD SCSI Disk Drive

To list all devices:


# lsdev -C -H | pg
name

status

location

description

L2cache0
aio0
cd0
en0
en1
en2
en3
ent0
(1410ff01)
ent1
(1410ff01)

Available
Defined
Available
Available
Defined
Available
Defined
Available

1G-19-00
1L-08
1c-08
1j-08
1n-08
1L-08

Available 1c-08

L2 Cache
Asynchronous I/O (Legacy)
IDE DVD-ROM Drive
Standard Ethernet Network Interface
Standard Ethernet Network Interface
Standard Ethernet Network Interface
Standard Ethernet Network Interface
10/100 Mbps Ethernet PCI Adapter II
10/100 Mbps Ethernet PCI Adapter II

lspv provides information about known physical volumes on the system along with its physical
disk name, physical volume identifier (PVIDs) and volume group.
# lspv
hdisk0
hdisk1

000c8edc02dccea9
000c8edc851ee972

rootvg
rootvg

active
active

# lspv hdisk0

PHYSICAL VOLUME:
hdisk0
VOLUME GROUP:
PV IDENTIFIER:
000c8edc02dccea9 VG IDENTIFIER
000c8edc00004c00000000fc851ef361
PV STATE:
active
STALE PARTITIONS:
0
ALLOCATABLE:
PP SIZE:
64 megabyte(s)
LOGICAL VOLUMES:
TOTAL PPs:
542 (34688 megabytes)
VG DESCRIPTORS:
FREE PPs:
86 (5504 megabytes)
HOT SPARE:
USED PPs:
456 (29184 megabytes)
FREE DISTRIBUTION: 25..60..00..00..01
USED DISTRIBUTION: 84..48..108..108..108

rootvg

yes
7
1
no

The p flag will list all physical partitions of physical volume hdisk0.
# lspv -p hdisk0
hdisk0:
PP RANGE STATE
REGION
LV NAME
1-4
used
outer edge
N/A
5-29
free
outer edge
30-109
used
outer edge
110-141
used
outer middle
142-201
free
outer middle
202-217
used
outer middle
218-221
used
center
N/A
222-325
used
center
/
326-381
used
inner middle
382-433
used
inner middle
/usr
434-541
used
inner edge
/usr

TYPE
hd5

MOUNT POINT
boot

hd9var
hd6

jfs
paging

hd3

jfs

hd8
hd4

jfslog

hd4
hd2
hd2

/var
N/A
/tmp

jfs
jfs
jfs
jfs

542-542

free

inner edge

Example of a problem on hdisk0.


# lspv -p hdisk0

PHYSICAL VOLUME:
hdisk0
VOLUME GROUP:
PV IDENTIFIER:
000c8edc001363a5 VG IDENTIFIER
000c8edc00004c00000000fc851ef361
PV STATE:
active
STALE PARTITIONS:
6
ALLOCATABLE:
yes
Note Stale Partitions Disk is BAD.

rootvg

PP SIZE:
TOTAL PPs:
FREE PPs:
USED PPs:
FREE DISTRIBUTION:
USED DISTRIBUTION:
(84+48+108+108+108)

7
1
no

64 megabyte(s)
542 (34688 megabytes)
86 (5504 megabytes)
456 (29184 megabytes)
25..60..00..00..01

LOGICAL VOLUMES:
VG DESCRIPTORS:
HOT SPARE:

84..48..108..108..108

USED PPs = 456

FREE PPs = 86 (25+60+1)

# lspv -p hdisk0
hdisk0:
PP RANGE
1-4
N/A
5-29
30-109
110-141
N/A
142-201
202-217
218-218
N/A
219-221
222-222
/
223-231
/
232-232
/
233-240
/
241-241
/
242-325
/
326-381
382-382
383-400
401-401
402-433
434-541
542-542

STATE

REGION
used

used
used
free
used
*stale
used
*stale
used
*stale
used
*stale
used
used
*stale
used
*stale
used
used
free

LV NAME

TYPE

outer edge

MOUNT POINT

hd5

boot

free
outer edge
outer edge
hd9var
outer middle
hd6

jfs
paging

/var

outer middle
outer middle
center
center
center
center

hd3

jfs
jfslog

hd8

jfslog
jfs

hd4
hd4

center

jfs
hd4

center
center
hd4
hd2
hd2
hd2
hd2
hd2

N/A

jfs

hd4

center

inner middle
inner middle
inner middle
inner middle
inner middle
inner edge
inner edge

/tmp

hd8

jfs

hd4

jfs

hd4

jfs
jfs
jfs
jfs
jfs
jfs
jfs

/
/usr
/usr
/usr
/usr
/usr

2) Volume Groups
To list volume groups that are currently active on your system, type:
lsvg -o
# lsvg -o
rootvg
List detailed information and status about the volume group.
# lsvg rootvg
VOLUME GROUP:
rootvg
000c8edc00004c00000000fc851ef361
VG STATE:
active
VG PERMISSION: read/write
megabytes)
MAX LVs:
256
LVs:
9
OPEN LVs:
8
TOTAL PVs:
2
STALE PVs:
0
ACTIVE PVs:
2
MAX PPs per PV: 1016
LTG size:
128 kilobyte(s)
HOT SPARE:
no

List the logical volumes in a volume group.


# lsvg -l rootvg
rootvg:
LV NAME
TYPE
MOUNT POINT
hd5
boot
hd6
paging
hd8
jfslog
hd4
jfs
hd2
jfs
/usr
hd9var
jfs
/var
hd3
jfs
/tmp
pac_lv1
jfs
/pac
lvbto
jfs
/bto/sys
hd7
sysdump
hd71
sysdump
paging00
paging

VG IDENTIFIER:
PP SIZE:
TOTAL PPs:

64 megabyte(s)
1084 (69376

FREE PPs:
USED PPs:
QUORUM:
VG DESCRIPTORS:
STALE PPs:
AUTO ON:
MAX PVs:
AUTO SYNC:
BB POLICY:

108 (6912 megabytes)


976 (62464 megabytes)
1
3
0
yes
32
no
relocatable

LPs

PPs

PVs

LV STATE

1
42
1
33
20

2
84
2
66
40

2
3
2
2
2

closed/syncd
open/syncd
open/syncd
open/syncd
open/syncd

20

40

open/syncd

open/syncd

open/syncd

72

144

open/syncd

18
18
42

18
18
84

1
1
2

open/syncd
open/syncd
open/syncd

List the physical volume status within a volume group.

N/A
N/A
N/A
/

N/A
N/A
N/A

# lsvg -p rootvg
rootvg:
PV_NAME
hdisk2
hdisk3
hdisk0
hdisk1

PV STATE
active
active
active
active

TOTAL PPs
135
135
135
135

FREE PPs
5
0
6
21

FREE DISTRIBUTION
01..00..00..00..04
00..00..00..00..00
00..00..00..00..06
00..00..10..00..11

List attributes about a physical volume (disk):


# lsattr -El hdisk2
PCM
algorithm
dist_err_pcnt
dist_tw_width
hcheck_interval
hcheck_mode
max_transfer
pvid
queue_depth
reserve_policy
size_in_mb

PCM/friend/scsiscsd
fail_over
0
50
0
nonactive
0x40000
00283edd26fdf5680000000000000000
3
single_path
36400

Path Control Module


Algorithm
Distributed Error Percentage
Distributed Error Sample Time
Health Check Interval
Health Check Mode
Maximum TRANSFER Size
Physical volume identifier
Queue DEPTH
Reserve Policy
Size in Megabytes

False
True
True
True
True
True
True
False
False
True
False

E. Running SNAP
Note: You must have an open PMR with pSeries Support (IBM) before continuing. All
references to the PMR number below will be in the format of xxxxx.YYY where xxxx
is the problem number and YYY is the branch number.

1) Call IBM

To find the 4-digit machine type:


# uname -M
IBM,7029-6C3

Search the report for General Info and view the HW_MODEL field.
GENERAL INFO
Next Section

Previous Section

====================================================================
GENERAL INFO: senthil : 0x590a0c1f : Fri 03-04-11 14:04:31 CST : 80.1
====================================================================
HOSTNAME: senthil
HOSTID: 0x590a0c1f
PRIM_IP_ADDRESS: x.x.x.x
HW_VENDOR: IBM
HW_MODEL: IBM,7029-6C3
OS_LEVEL: AIX 5.2
SYSTEM_MEMORY: 2048 Mb
DDSABLE: TRUE
DOMAIN: none

Follow the steps below to run snap and ftp the output to IBM:

2) How to run SNAP command:


Using the "snap" command to gather information:
This is a powerful command to gather lots of data on all types of machines. Following are some
cavaets with this command:
-- The "-b" flag gathers SSA information
-- The "-t" flag gathers the TCPIP information
-- The file created from the output is /tmp/ibmsupt/snap.pax.Z

To gather the basic information on a machine like error logs configuration, AIX driver
levels, run
# snap -r
(this removes any prior snap data)
# snap -gc
NOTE: Depending on the amount of SSA drives this could last anywhere from a few minutes to
2 hours, so be careful.
To gather the SSA info, use: # snap -gbc
To gather the SSA and TCPIP info, use: # snap gtbc
To gather all system configuration information: # snap ac

Example of output:
bos62833[root]: snap -r
Nothing to clean up
bos62833[root]: snap -gbc
Checking space requirement for general
information...........................................................
......................................................................
......................................................................
......................................................................
......................................................................
....... done.
..Checking space requirement for ssa information.......... done.
Checking for enough free space in filesystem... done.
********Checking and initializing directory structure
Creating /tmp/ibmsupt directory tree... done.
Creating /tmp/ibmsupt/ssa directory tree... done.
Creating /tmp/ibmsupt/general directory tree... done.
Creating /tmp/ibmsupt/general/diagnostics directory tree... done.
Creating /tmp/ibmsupt/testcase directory tree... done.
Creating /tmp/ibmsupt/other directory tree... done.
********Finished setting up directory /tmp/ibmsupt
Gathering general system
information...........................................................
......................................................................
......................................................................
......................................................................
......................................................................
....... done.
Gathering scanout information..done.
Gathering ssa system information.......... done.
Creating compressed pax file...
Starting pax/compress process... Please wait... done.
-rw-------

1 0

834911 Feb

8 00:08 snap.pax.Z

Note: additional flags to be used for specific data.

IBM support may request additional options to be executed with the snap command. From
man snap, these are the different Flags:
-a Gathers all system configuration information. This option requires
approximately 8MB of temporary disk space.
-A Gathers asynchronous (TTY) information.
-b Gathers SSA information.

-c Creates a compressed pax image (snap.pax.Z file) of all files in the


/tmp/ibmsupt directory tree or other named output directory.
-D Gathers dump and /unix information. The primary dump device is used.
Notes:
* If bosboot -k was used to specify the running kernel to be other than /unix,
the incorrect kernel is gathered. Make sure that /unix is or is linked to, the
kernel in use when the dump was taken.
If the dump file is copied to the host machine, the snap command does not
collect the dump image in the /tmp/ibmsupt/dump directory. Instead, it creates
a link in the dump directory to the actual dump image.
-d Dir Identifies the optional snap command output directory (/tmp/ibmsupt is
the default).
-f Gathers file system information.
-g Gathers the output of the lslpp -hBc command, which is required to recreate
exact operating system environments. Writes output to the
/tmp/ibmsupt/general/lslpp.hBc file.
Also collects general system information and writes the output to the
/tmp/ibmsupt/general/general.snap file.
-G Includes predefined Object Data Manager (ODM) files in general information
collected with the -g flag.
-i Gathers installation debug vital product data (VPD) information.

-k Gathers kernel information


-l Gathers programming language information.
-L Gathers LVM information.
-n Gathers Network File System (NFS) information.
-N Suppresses the check for free space.
-o OutputDevice Copies the compressed image onto diskette or tape.
-p Gathers printer information.
-r Removes snap command output from the /tmp/ibmsupt directory.

-s Gathers Systems Network Architecture (SNA) information.


-S Includes security files in general information collected with the -g flag.

-t Gathers Transmission Control Protocol/Internet Protocol (TCP/IP)


information.
-T Gathers all the log files for a multicpu trace. Only the base file,
trcfile, is captured with the -g flag.
-v Component Displays the output of the commands executed by the snap command.
Use this flag to view the specified name or group of files.
Note: Press the Ctrl-C key sequence to interrupt the snap command. A prompt
will return with the following options: press the Enter key to return to
current operation; press the S key to stop the current operation; press the Q
key to quit the snap command completely.
-w Gathers WLM information

3) Check the current maintenance level of your system:


# oslevel
5.2.0.0
To determine the highest recommended maintenance level reached for the current version of AIX
on the system, type:
# oslevel -r
5200-03
Beginning in 2006, IBM AIX changed from Maintenance Level (ML) to Technology
Level (TL) and Service Pack (SP) terminology. The command below will provide
you will TL and SP information:
# oslevel s
# 5200-08-01
This can be broken down as follows:
AIX Version:
Technology Level: 8
Service Pack:
1

5.2

For more detailed information on these topics, please refer


to The IBM AIX 5L Service Strategy and Best Practices
document.

4) Check dump size

Identify the dump space settings. Note that the dump will only
write to the primary or secondary and will not span to the
secondary if the primary fills:
# sysdumpdev l
primary
secondary
copy directory
forced copy flag
always allow dump
dump compression

/dev/hd7
/dev/hd71
/var/adm/ras
TRUE
TRUE
OFF
Display statistical info about the most

recent dump:
# sysdumpdev -L
0453-039
Device name:
/dev/hd7
Major device number: 10
Minor device number: 8
Size:
23327232 bytes
Uncompressed Size:
191149876 bytes
Date/Time:
Fri Feb 11 10:50:40 CST 2005
Dump status:
0
dump completed successfully
Estimates the size of the dump (in bytes) for the current running
system:
# sysdumpdev e
0453-041 Estimated dump size in bytes: 4280287232

To identify how much space is allocated to the dump device:


# lslv hd7
LOGICAL VOLUME:
hd7
VOLUME GROUP:
rootvg
LV IDENTIFIER:
00283edd00004c00000001024cb1a4c3.10 PERMISSION:
read/write
VG STATE:
active/complete
LV STATE:
opened/syncd
TYPE:
sysdump
WRITE VERIFY:
off
MAX LPs:
512
PP SIZE:
256 megabyte(s)
COPIES:
1
SCHED POLICY:
parallel
LPs:
18
PPs:
18
STALE PPs:
0
BB POLICY:
relocatable
INTER-POLICY:
minimum
RELOCATABLE:
yes
INTRA-POLICY:
middle
UPPER BOUND:
32
MOUNT POINT:
N/A
LABEL:
None
MIRROR WRITE CONSISTENCY: on/ACTIVE
EACH LP COPY ON A SEPARATE PV ?: yes
Serialize IO ?:
NO

Dump Space Size (hd7) = PPs x PP SIZE


Dump Space Size (hd7) = 18 X 256 megabytes = 4608
megabytes

5) Create a dump file


Look at the dump size and then execute df I or df k to find a file system with enough space to
proceed to packaging. Once a file system has been found, may proceed with creating a dump file
to ftp to IBM.
#
#
#
#
#

snap gfkDN (This command can be run from any directory.)


cd /tmp/ibmsupt/dump
ls (Ensure that unix.Z, dump.snap, and dump.Z are present.)
cd /tmp/ibmsupt
snap c

Ftp file to IBM.


If there is no room in /tmp, then run
#
#
#
#

snap gfkDNd
cd /<file system>/ibmsupt/dump
ls (Ensure that unix.Z, dump.snap, and dump.Z are present.)
snap cd /<file system>/ibmsupt

This will create a snap.pax.Z file in the /tmp/ibmsupt directory. The file will need to be renamed
to pmr#.branch#.snap.pax.Z.
# mv snap.pax.Z <pmr#.branch#.snap.pax.Z>

F. SHUTDOWN
The shutdown command halts the operating system. Only a user with root user authority can run
this command. Do not attempt to restart the system or turn off the system before the
shutdown completion message is displayed; otherwise, file system damage can result.
Make sure you are on the correct server prior to entering shutdown command:
Enter: hostname
To shutdown and restart the system:
# shutdown Fr
Other flags that could be used with the shutdown command are:
- h Halts the operating system completely.
-m Brings the system down to maintenance (single user) mode.
-d Brings the system down from a distributed mode to a multiuser mode.
-i Interactive mode. Displays interactive messages to guide the user through the shutdown.
The last command can be used to help determine when the system was last shut down.
# last
shutdown
shutdown
shutdown

shutdown
tty0
tty0
pts/1

Feb 11 14:05
Feb 10 20:23
Feb 04 07:08

G. HARDWARE ASSISTANCE
How to run Diagnostics
The diag command is menu driven and is used to run diagnostics for a suspected problem.
# diag
Press <Enter> to advance past the information screen.
Select Diagnostic Routines.
Select Problem Determination.
This instructs the diag command to test the system and analyze the error log.
You may run a diagnosis on a particular device by using the d flag.

# diag d (device name)


Display previous diagnostic results.
# cd /usr/lpp/diagnostics/bin
# ./diagrpt -o
Display all diagnostic result files logged since the data specified.
# /usr/lpp/diagnostics/bin/diagrpt s 030705
This will list results for March 7, 2011.
Diagnostic result files are stored in /etc/lpp/diagnostics/data directory.

Finding system configuration information


Total physical memory in system
# bootinfo r
Total number of processors in system
# lsdev Cc processor (this will list each processor)
Display configuration, diagnostic, vital product data about
system
# lscfg vp | more

H. LOGS
The first place you should go when troubleshooting problems in AIX is the error report
(errpt).
First run errpt without any options to get an overview of current errors:
# errpt|more
IDENTIFIER TIMESTAMP T C RESOURCE_NAME
B6048838
0725140606 P S SYSPROC
TERMINATED

DESCRIPTION
SOFTWARE PROGRAM ABNORMALLY

B6048838
TERMINATED
B6048838
TERMINATED
B6048838
TERMINATED
B6048838
TERMINATED
B6267342
B6267342
B6267342
B6267342

0725133506 P S SYSPROC

SOFTWARE PROGRAM ABNORMALLY

0725122506 P S SYSPROC

SOFTWARE PROGRAM ABNORMALLY

0724140106 P S SYSPROC

SOFTWARE PROGRAM ABNORMALLY

0721033906 P S SYSPROC

SOFTWARE PROGRAM ABNORMALLY

0721032506
0721032506
0721032506
0721032506

DISK
DISK
DISK
DISK

P
P
P
P

H
H
H
H

hdisk1356
hdisk1356
hdisk1355
hdisk1355

OPERATION
OPERATION
OPERATION
OPERATION

ERROR
ERROR
ERROR
ERROR

To get the specifics associated with the IDENTIFIER:


# errpt -aj B6048838 | more
--------------------------------------------------------------------------LABEL:
CORE_DUMP
IDENTIFIER:
B6048838
Date/Time:
Sequence Number:
Machine Id:
Node Id:
Class:
Type:
Resource Name:

Tue Jul 25 14:06:04 EDT


113629
00283E9D4C00
jrspa13t
S
PERM
SYSPROC

Description
SOFTWARE PROGRAM ABNORMALLY TERMINATED
Probable Causes
SOFTWARE PROGRAM
User Causes
USER GENERATED SIGNAL
Recommended Actions
CORRECT THEN RETRY
Failure Causes
SOFTWARE PROGRAM
Recommended Actions
RERUN THE APPLICATION PROGRAM
IF PROBLEM PERSISTS THEN DO THE FOLLOWING
CONTACT APPROPRIATE SERVICE REPRESENTATIVE
Detail Data
SIGNAL NUMBER
6
USER'S PROCESS ID:
7540756
FILE SYSTEM SERIAL NUMBER

44
INODE NUMBER
1474687
PROCESSOR ID
16
CORE FILE NAME
/pac/brsmdp07/bea/app/user_projects/domains/collections/core
PROGRAM NAME
java
ADDITIONAL INFORMATION
abort E8
??
Symptom Data
REPORTABLE

You can display errors that were encountered during the last day
by specifying a date in your search.
# date
Wed Feb 23 14:57:39 CST 2005
# errpt -a -s 0222145601 |more
-a display information in a detailed format
-s
display all records posted after the StartDate
Example: errpt -a -s (mmddhhmmyy)
year minus 24 hours

month, day, hour, minute, and

I. Installed Software Installation Info


How to determine the maintenance level of software:
# lslpp l | more (This will list every fileset on the system)
# lslpp l <Fileset> (Lists the state of a fileset)
# lslpp L | grep <Fileset> (Easy way to get basic version info)
# lslpp h <Fileset> (Displays when a fileset was installed)