Training AN15 - AIX III - Advanced Administration and Problem Determination - Instructor Guide - 2012

V7.0.
cover
Front cover
Power Systems for AIX III:

Advanced Administration and
Problem Determination
(Course code AN15)
Instructor Guide
ERC 2.2
Instructor Guide
Trademarks
The reader should recognize that the following terms, which appear in the content of this
training document, are official trademarks of IBM or other companies:
IBM® is a registered trademark of International Business Machines Corporation.
The following are trademarks of International Business Machines Corporation in the United
States, or other countries, or both:
AIX 5L™ AIX 6™ AIX®
AS/400® DB2® DS8000®
HACMP™ Initiate® MWAVE®
Power Systems™ Power® PowerVM®
POWER6® POWER7® pSeries®
Redbooks® RS/6000® System p®
Tivoli®
Intel is a trademark or registered trademark of Intel Corporation or its subsidiaries in the
United States and other countries.
Windows is a trademark of Microsoft Corporation in the United States, other countries, or
both.
UNIX is a registered trademark of The Open Group in the United States and other
countries.
VMware and the VMware "boxes" logo and design, Virtual SMP and VMotion are registered
trademarks or trademarks (the "Marks") of VMware, Inc. in the United States and/or other
jurisdictions.
Other product and service names might be trademarks of IBM or other companies.
November 2012 edition

The information contained in this document has not been submitted to any formal IBM test and is distributed on an “as is” basis without
any warranty either express or implied. The use of this information or the implementation of any of these techniques is a customer
responsibility and depends on the customer’s ability to evaluate and integrate them into the customer’s operational environment. While
each item may have been reviewed by IBM for accuracy in a specific situation, there is no guarantee that the same or similar results will
result elsewhere. Customers attempting to adapt these techniques to their own environments do so at their own risk.
© Copyright International Business Machines Corporation 2009, 2012.

This document may not be reproduced in whole or in part without the prior written permission of IBM.
Note to U.S. Government Users — Documentation related to restricted rights — Use, duplication or disclosure is subject to restrictions
set forth in GSA ADP Schedule Contract with IBM Corp.
V7.0.1
Instructor Guide
TOC Contents
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Instructor course overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Course description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Agenda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix
Unit 1. Advanced AIX administration overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1

Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2
Application outages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-4
Maintenance window tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-7
Effective problem management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-11
Before problems occur . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-14
Before problems occur: A few good commands . . . . . . . . . . . . . . . . . . . . . . . . . . 1-17
Steps in problem resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-19
Progress and reference codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-23
Reference codes at the IBM Information Center . . . . . . . . . . . . . . . . . . . . . . . . . . 1-26
Working with AIX Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-28
AIX Support test case data (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-31
AIX Support test case data (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-34
AIX software update hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-36
Relevant documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-39
Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-41
Exercise: Problem diagnostic information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-43
Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-45
Unit 2. The Object Data Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1

Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2
2.1. Introduction to the ODM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5
What is the ODM? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-6
Data managed by the ODM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-8
ODM components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-11
ODM database files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-13
Device configuration summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-16
Configuration manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-18
Location and contents of ODM repositories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-20
How ODM classes act together . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-24
Data not managed by the ODM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-26
Let’s review: Device configuration and the ODM . . . . . . . . . . . . . . . . . . . . . . . . . 2-28
ODM commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-30
Changing attribute values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-33
Using odmchange to change attribute values . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-36
2.2. ODM database files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-39
© Copyright IBM Corp. 2009, 2012 Contents iii

Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Software vital product data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-40

Software states you should know about . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-43
Predefined devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-46
Predefined attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-51
Customized devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-54
Customized attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-58
Additional device object classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-60
ODM and high-level device commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-63
Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-66
Exercise: The Object Data Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-68
Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-70
Unit 3. Error monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-1

Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-2
3.1. Working with the error log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-5
Error logging components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-6
Generating an error report using SMIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-9
The errpt command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-13
A summary report: errpt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-16
A detailed error report: errpt -a . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-18
Types of disk errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-22
LVM error log entries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-25
Maintaining the error log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-28
Exercise: Error monitoring (part 1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-31
3.2. Error notification and syslogd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-33
Error notification methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-34
Self-made error notification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-37
ODM-based error notification: errnotify . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-40
syslogd daemon . . . . . . . . . . . . . . . . . . . . . . . . . . 3-44
syslogd configuration examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-47
Redirecting syslog messages to error log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-51
Directing error log messages to syslogd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-53
System hang detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-55
Configuring shdaemon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-58
Checkpoint (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-61
Checkpoint (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-63
Exercise: Error monitoring (part 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-65
Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-67
Unit 4. Network Installation Manager basics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-1

Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-2
NIM overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-4
Machine roles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-7
Boot process for AIX installation: Tape or CD . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-10
Boot process for AIX installation with NIM (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . .4-13
Boot process for AIX installation with NIM (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . .4-15
NIM objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-17
Listing NIM objects and their attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-20
iv AIX Advanced Administration © Copyright IBM Corp. 2009, 2012

V7.0.1
Instructor Guide
TOC NIM configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-22

Resources objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-26
Resources objects: lpp_source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-29
Resources objects: spot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-33
Resources objects: mksysb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-37
Networks objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-40
Machines objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-43
Defining a machine object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-46
Define a client using SMIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-49
NIM operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-53
bos_inst operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-58
More information about NIM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-61
Additional topics in NIM course . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-64
Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-66
Exercise: Basic NIM configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-68
Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-70
Unit 5. System initialization: Accessing a boot image . . . . . . . . . . . . . . . . . . . . . . . 5-1

Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-2
5.1. System startup process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-5
How does a Power server or LPAR boot? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-6
Loading of a boot image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-9
Contents of the boot logical volume (hd5) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-12
5.2. Unable to find boot image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-15
Working with bootlists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-16
AIX 7: Bootlist pathid enhancements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-19
Starting System Management Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-21
Working with bootlists in SMS (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-24
Working with bootlists in SMS (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-27
5.3. Corrupted boot logical volume. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-31
Boot device alternatives (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-32
Boot device alternatives (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-35
Accessing a system that will not boot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-37
Booting in maintenance mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-41
Working in maintenance mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-44
How to fix a corrupted BLV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-47
Checkpoint (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-50
Checkpoint (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-52
Exercise: System initialization: Accessing a boot image . . . . . . . . . . . . . . . . . . . . 5-54
Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-56
Unit 6. System initialization: rc.boot and inittab . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-1

Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-2
6.1. AIX initialization part 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-5
System software initialization overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-6
rc.boot 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-9
rc.boot 2 (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-12
rc.boot 2 (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-15
© Copyright IBM Corp. 2009, 2012 Contents v

Instructor Guide
rc.boot 3 (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-18
rc.boot 3 (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . 6-21
rc.boot summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-24
Fixing corrupted file systems and logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-26
Let’s review: rc.boot (1 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-29
6.2. AIX initialization part 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-35
Configuration manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-36
Config_Rules object class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-39
cfgmgr output in the boot log using alog , . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-42
/etc/inittab file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-45
Boot problem management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-49
Let's review: /etc/inittab file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-53
Checkpoint (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-58
Checkpoint (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-60
Exercise: System initialization: rc.boot and inittab . . . . . . . . . . . . . . . . . . . . . . . . .6-62
Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-64
Unit 7. LVM metadata and related problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-1

Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-2
7.1. LVM data representation: Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-5
Review: LVM terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-6
LVM identifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-9
LVM data on disk control blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-12
LVM data in the operating system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-15
LVM related ODM objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-17
7.2. Export and import . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-19
Exporting a volume group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-20
Importing a volume group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-23
importvg and duplicate names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-26
importvg and existing logical volumes . . . . . . . . . . . . . . 7-29
importvg and existing file systems (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-31
importvg and existing file systems (2 of 2) . . . . . . . . . . . 7-34
7.3. LVM Metadata details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-37
Contents of the VGDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-38
VGDA example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-41
The logical volume control block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-45
How LVM interacts with the ODM and the VGDA . . . . . . . . . . . . . . . . . . . . . . . . . .7-48
ODM entries for physical volumes (1 of 4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-51
ODM entries for volume groups (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-62
ODM entries for volume groups (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-64
ODM entries for logical volumes (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-66
ODM entries for logical volumes (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-68
7.4. LVM metadata related problems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-71
vi AIX Advanced Administration © Copyright IBM Corp. 2009, 2012

V7.0.1
Instructor Guide
TOC ODM-related LVM problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-72

Fixing ODM problems (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-75
Fixing ODM problems (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-78
Intermediate level ODM commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-82
Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-85
Exercise: LVM metadata and related problems . . . . . . . . . . . . . . . . . . . . . . . . . . 7-87
Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-89
Unit 8. Disk management procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-1

Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-3
8.1. Failed disks: Mirroring and quorum issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-5
Mirroring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-6
Stale partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-8
Mirroring rootvg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-11
VGDA count . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-16
Quorum not available . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-18
Nonquorum volume groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-21
Forced vary on (varyonvg -f) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-24
Physical volume states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-27
8.2. Disk replacement techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-31
Disk replacement: Starting point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-32
Procedure 1 (1 of 4): Disk mirrored . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-35
Procedure 1 (2 of 4): Disk mirrored with replacepv . . . . . . . . . . . . . . . . . . . . . . . . 8-38
Procedure 1 (3 of 4): Disk mirrored without replacepv . . . . . . . . . . . . . . . . . . . . . 8-40
Procedure 1 (4 of 4): Special steps for rootvg . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-43
Procedure 2 (1 of 2): Disk still working . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-45
Procedure 2 (2 of 2): Special steps for rootvg . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-48
Procedure 3: Disk in missing or removed state . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-51
Procedure 4: Total rootvg failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-54
Procedure 5: Total non-rootvg failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-57
ODM errors from LVM commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-60
Removal of disk without reducevg (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-63
Removal of disk without reducevg (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-65
Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-68
Exercise: Disk management procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-70
Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-72
Unit 9. Install and cloning techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-1

Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-3
9.1. Alternate disk installation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-5
Topic 1 objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-6
Alternate disk installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-8
Alternate mksysb disk installation (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-11
Alternate mksysb disk installation (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-14
Alternate disk rootvg cloning (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-16
Alternate disk rootvg cloning (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-18
Removing an alternate disk installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-20
NIM alternate disk migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-23
© Copyright IBM Corp. 2009, 2012 Contents vii

Instructor Guide
Exercise: Alternate disk install . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-26

9.2. Using multibos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-29
Topic 2 objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-30
multibos overview . . . . . . . . . . . . . . . . . . . . . . . . 9-32
Active and standby BOS logical volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-35
Setting up a standby BOS (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-37
Setting up a standby BOS (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-40
Other multibos operations (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-42
Other multibos operations (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-45
Checkpoint (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-48
Checkpoint (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-50
Exercise: multibos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-52
Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-54
Unit 10. Advanced backup techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-1

Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-3
Backup data inconsistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-5
Ensuring backup data consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-7
10.1. LVM mirror-based online backups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-9
Topic 1 objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-10
Online JFS backup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-12
Splitting the mirror . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-14
Reintegrate a mirror backup copy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-17
Snapshot volume groups (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-19
Snapshot volume groups (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-21
Snapshot volume group commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-23
Snapshot volume group example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-25
10.2. JFS2 snapshot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-27
JFS2 snapshot (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-30
JFS2 snapshot (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-32
JFS2 snapshot mechanism (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-35
JFS2 snapshot mechanism (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-37
JFS2 snapshot SMIT menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-39
Creating snapshots: External . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-41
Creating snapshots: Internal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-45
Listing snapshots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-47
Using a JFS2 snapshot to recover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-50
Using a JFS2 external snapshot to back up . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-53
Using a JFS2 internal snapshot to back up . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-56
JFS2 snapshot space management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-59
10.3. SAN Copy issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-61
SAN Copy and file system cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-64
Use of JFS2 freeze and thaw . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-66
Consistency groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-69
Accessing SAN Copy data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-72
The recreatevg command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-75
viii AIX Advanced Administration © Copyright IBM Corp. 2009, 2012

V7.0.1
Instructor Guide
TOC Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-77

Exercises: Advanced backup techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-79
Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-81
Unit 11. Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-1

Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-2
When do I need diagnostics? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-4
Where do I run diagnostics? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-7
The diag command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-9
Working with diag (1 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-11
What happens if a device is busy? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-19
Diagnostic modes (1 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-21
diag: Using task selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-29
Diagnostic log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-32
Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-34
Exercise: Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-36
Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-38
Unit 12. The AIX system dump facility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-1

Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-2
System dumps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-4
Types of dumps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-6
How a system dump is invoked . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-9
Crash code: 888 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-12
When a dump occurs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-15
The sysdumpdev command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-17
Firmware assisted dump . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-22
Dedicated dump device (1 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-25
Estimating dump size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-32
The dumpcheck utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-35
Methods of starting a dump . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-38
Starting a dump from a TTY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-42
Generating dumps with SMIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-45
Generating dumps with HMC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-48
Dump-related AIX progress codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-50
Copying a system dump . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-53
Automatically reboot after a crash . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-56
Sending a dump to IBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-59
Using kdb to analyze a dump . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-63
Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-67
Exercise: System dump . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-69
Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-71
© Copyright IBM Corp. 2009, 2012 Contents ix

Instructor Guide
Appendix A. Checkpoint solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-1
Appendix B. Command summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-1
Appendix C. AIX dump code and progress codes . . . . . . . . . . . . . . . . . . . . . . . . . . C-1
x AIX Advanced Administration © Copyright IBM Corp. 2009, 2012

V7.0.1
Instructor Guide
TMK Trademarks
The reader should recognize that the following terms, which appear in the content of this
training document, are official trademarks of IBM or other companies:
IBM® is a registered trademark of International Business Machines Corporation.
The following are trademarks of International Business Machines Corporation in the United
States, or other countries, or both:
AIX 5L™ AIX 6™ AIX®
AS/400® DB2® DS8000®
HACMP™ Initiate® MWAVE®
Power Systems™ Power® PowerVM®
POWER6® POWER7® pSeries®
Redbooks® RS/6000® System p®
Tivoli®
Intel is a trademark or registered trademark of Intel Corporation or its subsidiaries in the
United States and other countries.
Windows is a trademark of Microsoft Corporation in the United States, other countries, or
both.
UNIX is a registered trademark of The Open Group in the United States and other
countries.
VMware and the VMware "boxes" logo and design, Virtual SMP and VMotion are registered
trademarks or trademarks (the "Marks") of VMware, Inc. in the United States and/or other
jurisdictions.
Other product and service names might be trademarks of IBM or other companies.
© Copyright IBM Corp. 2009, 2012 Trademarks xi

Instructor Guide
xii AIX Advanced Administration © Copyright IBM Corp. 2009, 2012

V7.0.1
Instructor Guide
pref Instructor course overview

This is a five-day course for existing system administrators with at
least six months experience in AIX. It is assumed that the students
have general administrative skills, such as installing the operating
system, configuring and managing devices, working with volume
groups (including logical volumes and file systems), adding and
administering user accounts, and general day-to-day housekeeping
skills.
The main target of this course is to provide advanced AIX
administration skills, including various tools and techniques in
determining and solving problems, monitoring for problems, reducing
the maintenance window for system updates, and minimizing
downtime for system maintenance.
© Copyright IBM Corp. 2009, 2012 Instructor course overview xiii

Instructor Guide
xiv AIX Advanced Administration © Copyright IBM Corp. 2009, 2012

V7.0.1
Instructor Guide
pref Course description

Power Systems for AIX III: Advanced Administration and Problem
Determination
Duration: 5 days
Purpose
This course provides advanced AIX system administrator skills with a
focus on availability and problem determination. It provides detailed
knowledge of the ODM database where AIX maintains so much
configuration information. It shows how to monitor for and deal with
AIX problems. There is special focus on dealing with Logical Volume
Manager problems, including procedures for replacing disks. Several
techniques for minimizing the system maintenance window are
covered. While the course includes some AIX 7.1 enhancements,
most of the material is applicable to prior releases of AIX.
Audience
This is an advanced course for AIX system administrators, system
support, and contract support individuals with at least six months of
experience in AIX.
Prerequisites
You should have basic AIX System Administration skills. These skills
include:
• Use of the Hardware Management Console (HMC) to activate a
logical partition running AIX and to access the AIX system console
• Install an AIX operating system from an already configured NIM
server
• Implementation of AIX backup and recovery
• Manage additional software and base operating system updates
• Familiarity with management tools such as SMIT
• Understand how to manage file systems, logical volumes, and
volume groups
• Mastery of the UNIX user interface including use of the vi editor,
command execution, input and output redirection, and the use of
utilities such as grep
© Copyright IBM Corp. 2009, 2012 Course description xv

Instructor Guide
These skills could be developed through experience or by formal

training. The recommended training course to obtain these
prerequisite skills is:
• Power Systems for AIX II: AIX Implementation and Administration
AN12 or AX12 and their prerequisites
If the student has AIX system administration skills, but is not familiar
with the LPAR environment, those skills may be obtained by attending
the following course:
• AN11 or AX11 - Power Systems Administration I: LPAR
Configuration
Objectives
On completion of this course, students should be able to:
• Perform system problem determination and reporting procedures
including analyzing error logs, creating dumps of the system, and
providing needed data to the AIX Support personnel
• Examine and manipulate Object Data Manager databases
• Identify and resolve conflicts between the Logical Volume Manager
(LVM) disk structures and the Object Data Manager (ODM)
• Complete a very basic configuration of Network Installation
Manager to provide network boot support for either system
installation or booting to maintenance mode
• Identify various types of boot and disk failures and perform the
matching recovery procedures
• Implement advanced methods such as alternate disk install,
multibos, and JFS2 snapshots to use a smaller maintenance
window
Contents
• Advanced AIX administration overview
• The Object Data Manager
• Error monitoring
• Network Installation Manager basics
• System initialization: Accessing a boot image
• System initialization: rc.boot and inittab
• LVM metadata and related problems
xvi AIX Advanced Administration © Copyright IBM Corp. 2009, 2012

V7.0.1
Instructor Guide
pref • Disk management procedures

• Install and cloning techniques
• Advanced backup techniques
• Diagnostics
• The AIX system dump facility
© Copyright IBM Corp. 2009, 2012 Course description xvii

Instructor Guide
xviii AIX Advanced Administration © Copyright IBM Corp. 2009, 2012

V7.0.1
Instructor Guide
pref Agenda
The estimated timings provided here are for content only. It assumes the remainder of
the day is consumed with hourly breaks and a one hour lunch break. Most days are
timed to allow class dismissal between 4 p.m. and 4:30 p.m. assuming a 9 a.m. to 5
p.m. class day. If the class runs quicker than expected, some days have an optional lab
for the students to play with, which will help fill in the time.
Day 1 (est 5:15)

(00:20) Welcome
(00:55) Unit 1: Advanced AIX administration overview
(00:35) Exercise 1: Problem diagnostic information
(01:10) Unit 2: The Object Data Manager
(00:45) Exercise 2: The Object Data Manager
(01:10) Unit 3: Error monitoring
(00:20) (optional) Exercise 2: Object Data Manager, Part 3
Day 2 (est 5:30)

(00:45) Exercise 3: Error monitoring
(01:00) Unit 4: Network Installation Manager basics
(00:55) Exercise 4: Basic Network Installation Manager configuration
(01:30) Unit 5: System initialization: Accessing a boot image
(01:20) Exercise 5: System initialization: Accessing a boot image
Day 3 (est 5:35)

(01:25) Unit 6: System initialization: rc.boot and inittab
(00:40) Exercise 6: System initialization: rc.boot and inittab
(01:25) Unit 7: LVM metadata and related problems
(01:20) Exercise 7: LVM metadata and related problems
(00:25) Unit 8: Disk management procedures, Topic 1
(00:20) Exercise 8: Disk management procedures, Part 1
(00:20) (optional) Exercise 7: LVM metadata and related problems,
Part 6
© Copyright IBM Corp. 2009, 2012 Agenda xix

Instructor Guide
Day 4 (est 5:35)

(00:50) Unit 8: Disk management procedures, Topic 2
(00:30) Exercise 8: Disk management procedures, Parts 2 and 3
(00:25) Unit 9: Install and cloning techniques, Topic 1
(00:35) Exercise 9: Install and cloning techniques, Part 1
(00:24) Unit 9: Install and cloning techniques, Topic 2
(00:45) Exercise 9: Install and cloning techniques, Part 2
(00:20) Unit 10: Advanced backup techniques, Topic 1
(00:30) Exercise 10: Advanced backup techniques, Part 1
(00:50) Exercise 10: Advanced backup techniques, Parts 3 and 4
Day 5 (est 4:00)

(00:25) Unit 11: Diagnostics
(00:55) Exercise 11: Diagnostics
(01:25) Unit 12: The AIX system dump facility
(01:25) Exercise 12: The AIX system dump facility
(00:30) Wrap up / Evaluations
xx AIX Advanced Administration © Copyright IBM Corp. 2009, 2012

V7.0.1
Instructor Guide
Uempty Unit 1. Advanced AIX administration overview
Estimated time
00:55
What this unit is about

This unit introduces various AIX administration issues related to
problem determination and handling system maintenance and backup
in an efficient manner.
What you should be able to do

After completing this unit you should be able to:
• List the steps of a basic methodology for problem determination
• List AIX features that assist in minimizing planned downtime or
shortening the maintenance window
• Explain how to find documentation and other key resources
needed for problem resolution
How you will check your progress

Accountability:
• Checkpoint questions
• Lab exercise
References
SG24-7910 IBM AIX Version 7.1 Differences Guide (Redbook)
SG24-5496 Problem Solving and Troubleshooting in AIX 5L
(Redbook)
© Copyright IBM Corp. 2009, 2012 Unit 1. Advanced AIX administration overview 1-1
Instructor Guide
Unit objectives
IBM Power Systems
After completing this unit, you should be able to:

List the steps of a basic methodology for problem
determination
List AIX features that assist in minimizing planned downtime
or shortening the maintenance window
Explain how to find documentation and other key resources
© Copyright IBM Corporation 2012
Figure 1-1. Unit objectives AN152.2
Notes:
1-2 AIX Advanced Administration © Copyright IBM Corp. 2009, 2012

V7.0.1
Instructor Guide
Uempty Instructor notes:

Purpose — List the objectives for this unit.
Details —
Additional information — The Problem Solving and Troubleshooting in AIX 5L Redbook
listed under the References heading was last updated in May 2002. However, it appears
that a more current Redbook dealing with this topic is not available.
Transition statement — Problem determination is an important part of system
administration.
Instructor Guide
Application outages
IBM Power Systems
Functional or performance
Avoid unplanned outages with best practices
Change control
Data security
Capacity planning
High availability design
Avoid planned outages
Fall-over to backup server
Relocate application (LPAR or WPAR mobility)
Use maintenance windows
Application stopped versus slow activity
Plan enough time for back-out or recovery
Minimize time needed
Effective problem determination and recovery
Figure 1-2. Application outages AN152.2
Notes:
Introduction
Providing system availability is a major responsibility of any system administrator. An
outage may be caused by a functional problem (such as an application or system crash)
or a server performance problem (business is seriously impacted due to poor response
times or late jobs). There are many approaches to dealing with this.
Unplanned outages
When most of us think of availability, we think of unplanned outages. Regular hardware
and software maintenance can often avoid these outages. Designing the computing
facility to have redundant components (power, network adapters, network switches,
storage, and more) can make the overall system resilient to the failure of individual
components. Performance problems are often the result of failing to do proper capacity
planning, resulting in not enough resources (memory, processors, network bandwidth,
or disk I/O bandwidth) to handle the increased workload. If there is no change control to

V7.0.1
Instructor Guide
Uempty manage what work is placed on a system, capacity planning is even more challenging.
Furthermore, uncontrolled changes to a system result in uncontrolled exposure to
possible outages created by those changes, an thus unplanned outages. Computer
viruses and other malicious attacks by computer hackers can also reduce system
availability (in addition to the exposure of losing proprietary information). Good data
security policies are essential.
Even when implementing good policies in these areas, some unplanned outages will
still happen. In these situations, the system administrator needs to have a plan for
minimizing the impact and recovering as quickly as possible. One common approach is
to have an alternate system that can take over the work of the failed system. High
Availability Cluster Multi-Processing (HACMP) provides a system for either concurrent
processing by multiple systems, or an automated fall-over to a backup system, thus
minimizing the impact of a server failure. Such server redundancy can be designed to
work within a single facility or be divided between different geographical locations.
Obviously, rapid notification of a problem, effective and prompt diagnosis of the cause,
and being able to quickly implement an effective solution will all contribute to a smaller
mean time to recovery.
Planned outages
By using change control, the risk associated with certain categories of potential
unplanned outages can be managed by implementing the changes during planned
windows of time when the impact of any unexpected problem (resulting from the
change) is minimized. In addition, there are certain types of changes for which an
outage is unavoidable.
Some facilities will implement multiple types of maintenance windows. One type would
be frequent short maintenance windows for any administrative work that will compete
with applications for resources (performance impact) or have a small chance of having
a functional disruption. Another type would be a less frequent window in which any
reboot of the system or any major change to the level of the operating system or major
subsystems, such as database software, would be allowed.
Sometimes, the amount of time in a maintenance window is relatively small and the
work has to be carefully planned. You also need to allow time to recover if any thing
goes wrong due to the maintenance. Any needed resources that can be pre-staged will
help expedite the work. Any approach that can speed recovery after a problem occurs is
also useful.
For systems which need to be up 24 hours a day, seven days a week, and every day in
the year (24x7x365), even a short outage cannot be tolerated. In those situation, a
method to non-disruptively move the applications to another system can be invaluable.
If an HACMP cluster solution is already in place to handle unplanned outages, then this
can be used to manually fall-over the services to another system while maintenance is
being done. Other solutions are to use Live Partition Mobility or Live Application
Mobility.
Instructor Guide
Instructor notes:
Purpose — Provide an overview of issues related to application availability.
Details —
Additional information —
Transition statement — Let’s look at some factors that affect the amount of time needed
in a maintenance window.

V7.0.1
Instructor Guide
Uempty
Maintenance window tasks

IBM Power Systems
Minimize time needed for tasks

Including time to recover from a failed task
Operating system maintenance

Pre-staging of maintenance
Applying maintenance to alternate rootvg
Applying maintenance with alternate BLV
Reboot to use updated alternate
System backups
Minimizing rootvg size
Snapshot techniques for user file systems
Figure 1-3. Maintenance window tasks AN152.2
Notes:
Expediting work in the maintenance window

The quicker maintenance can be completed the sooner you can get the system back up
and head home (this is likely at night or on a weekend). More importantly, expediting the
expedited activities will allow more time to handle any problems that may arise.
Operating system maintenance

Ensure you have on hand whatever materials you will need for the job, such as the
installation media. Eliminating the need to handle that media can be important. This can
be done by pre-copying all of the needed filesets to disk storage. This could be on an
NFS or NIM server (provided you have sufficient network bandwidth) or it could be a
software repository on the system being updated. If using a software repository on the
system which is being updated, it is recommended that the filesets be in a file system
allocated out of a different volume group than the rootvg.
Instructor Guide
An important technique, that we will cover, is the use of an alternate storage for the
target of the software update. What we mean is that the updates are not made to the
rootvg, but rather to a copy of the rootvg. This has two advantages. First, there is no
change being made to the active rootvg. For locations that make a distinction between
changing the level of the operating system and simply doing work that has a
performance impact, the actual time consuming update activity can be done in a more
frequently available window. Then when a major maintenance window arrives, you only
need to reboot to make it effective. The second advantage, and to some the more
important advantage, is the ease of recovery. If you find that there are serious problems
with running under the new level of code, you only need to reboot back to the earlier
code level, rather than recover from a mksysb or reject the entire update. Of course, the
down side is that you will need to reboot to make the update effective; but, this is
something a major maintenance window should expect.
There are two techniques that we will cover. One technique, is creating an alternate set
of logical volumes that are copies of the rootvg BOS logical volumes. This is called
multibos. The other technique, is creating an alternate volume group which is a clone of
the rootvg. In each case, you would apply the maintenance to the copy and then later
reboot to make it effective.
System backups
Another common maintenance activity is backing up the system. Unless you have an
application that is designed to manage a recovery process using fuzzy backups, you
will need to quiesce the application activity long enough to be sure that there are no
inconsistencies in the backup. The term fuzzy backup refers to a backup in which the
application was making changes during the backup. For a given transaction, multiple
data changes are made. Some of these transaction related changes are made before
that data was backed up, while other changes were made after that data was backed
up. Thus the backup has one piece of data which reflects the transaction and another
piece of data that does not reflect the transaction. The two pieces of data are
inconsistent and such a backup is referred to as fuzzy.
For the rootvg itself, the size of the rootvg should be minimized. It should only contain
what is needed for the OS. All user data and other non-essential files should be backed
up and restored separately. An example would be the standard location of a software
repository: /usr/sys/inst.images. The software repository can be very large and yet
this common path resides in the /usr file system, which is in the rootvg. Placing the
software repository in a separate file system with its own recovery plan (could be using
the original media as the backup) can help reduce backup and recovery time. Another
common example is the /home file system. If users have vast amounts of data stored
there, then over mounting with a separate file system can again speed up working with
the rootvg. There other file systems such as /tmp that could have contents be
eliminated from the system backup.The trick is that these would need to be excluded
(not mounted or identified in /etc/exclude.rootvg) from the backup during mksysb

V7.0.1
Instructor Guide
Uempty execution, and then separately recovered from their own backup. Other user data will
be in separate user volume groups.
With the emphasis on separate backups for non-BOS data, there comes a need to
minimize how long the applications need to be quiesced and still have data consistency.
One technique that AIX provides is JFS2 snapshots, which will allow us to only very
briefly quiesce the application and still have a consistent picture of the data at a single
point in time. Then we can either use that snapshot of the data as its own backup, or
base an actual backup upon that snapshot (in order to have off-site storage of the
backup). There other facilities for doing snapshot captures of data. Some are part of the
storage subsystems and some are part of total storage solutions such as Tivoli Storage
Manager. Our focus will be on the facility that is provided with AIX, JFS2 snapshot.
Instructor Guide
Instructor notes:
Purpose — Cover approaches that reduce maintenance time.
Details —
Transition statement — Actual hardware or software problems are also a concern for
application availability. What do we need to do to better manage problems when they
occur?

V7.0.1
Instructor Guide
Uempty
Effective problem management

IBM Power Systems
Keep system documentation current
Keep maintenance up to date
Use a problem determination methodology
If an AIX bug:
Collect problem information
Open problem report with AIX Support
Provide snap with information
Figure 1-4. Effective problem management AN152.2
Notes:
Obtaining and documenting information about your system

It is a good idea, whenever you approach a new system, to learn as much as you can
about that system. It is also critical to document not only the physical resources and the
devices, but also how the system has been configured (network, LVM, and more). Then
this information will be ready when needed.
Later in the course, we will suggest some ways to collect system information.
System maintenance
Sometimes code works well under normal testing or production circumstances, but can
have a poor logic discovered when faced with an unanticipated situation. Alternatively, it
could be some non-central aspect of the code that is not noticed normally. The number
of facilities using this code is large enough that there is a good chance that one of the
facilities will detect and report the problem not long after release of the new code level.
Instructor Guide
The fix for the code defect will usually come out in the next released fix pack. On the
other hand, many facilities may not be effected by or be concerned about the code
defect problem for months, until the circumstances arise in which it represents a
problem. By installing newer service packs, a facility can benefit from the experience of
others and avoid being impacted by known problems.
Obviously there is always the possible exposure that a new fix pack will introduce new
problems, while solving many old problems.
This course will cover some techniques to use in applying fix packs.
Problem determination
Once you find yourself impacted by what you believe to be a product defect, you will
need to obtain prompt resolution. While there is no substitute for experience (the ability
to recognize a situation and remember the details of how you dealt with it the last time a
similar problem occurred), many problems will be most effectively solved by following a
well developed problem determination methodology. This course will cover a basic
problem determination methodology.
Problem reporting
When you find yourself impacted by what you believe to be a product defect, you will
need to contact AIX Support. Before contacting AIX Support, you should write up a
description of the problem and the surrounding circumstances. When you open a new
Problem Management Report (PMR) with AIX Support, you will be expected to provide
them with a wealth of information to assist them in determining the cause of the
problem. The snap command is a common tool to assist in collecting a vast amount of
information about the environment surrounding the problem. The course materials will
cover these problem reporting procedures.

V7.0.1
Instructor Guide

Purpose — Introduce problem management.
Details —
Transition statement — As just stated, keeping good documentation is important. Let’s
take a closer look at this.
Instructor Guide
Before problems occur

IBM Power Systems
Effective problem determination starts with a good

understanding of the system and its components
The more information you have about the normal operation

of a system, the better
System configuration
Operating system level
Applications installed
Baseline performance System
System
documentation
documentation
Installation, configuration, and
service manuals
Figure 1-5. Before problems occur AN152.2
Notes:
Obtaining and documenting information about your system

It is a good idea whenever you approach a new system, to learn as much as you can
about that system.
It is also critical to document both logical and physical device information so that it is
available when troubleshooting is necessary.
Information that should be documented

It is a best practice to maintain (what is commonly referred to as) a control book. This is
an collection of information that describes various aspects of your system.
Examples of important items that should be determined and recorded include the
following:
- Machine architecture (model, CPU type)

V7.0.1
Instructor Guide
Uempty - Physical volumes (type and size of disks)

- Volume groups (names, just a bunch of disks (JBOD) or redundant array of
independent disks (RAID)
- Logical volumes (mirrored or not, which volume group, type)
- File systems (which volume group, what applications)
- Memory (size) and paging spaces (how many, location)
Instructor Guide
Instructor notes:
Purpose — Provide guidance on what needs to be known before things go wrong.
Details — This visual introduces one of the most important points that you as an instructor
should make regarding “what it takes” to be a successful system administrator: In order to
be successful at determining what has gone wrong (and how to respond) when there are
system problems, the administrator must be extremely familiar with the characteristics of
his or her system when it is functioning normally. Be sure to make this point!
Use the student notes to guide the rest of your presentation.
Transition statement — Some of the commands you might want to start out with are
discussed next.

V7.0.1
Instructor Guide
Uempty
Before problems occur: A few good commands

IBM Power Systems
• lspv Lists physical volumes, PVID, VG membership

• lscfg Provides information regarding system
components
• prtconf Displays system configuration information
• lsvg Lists the volume groups
• lsps Displays information about paging spaces
• lsfs Gives file system information
• lsdev Provides device information
• getconf Displays values of system configuration
variables
• bootinfo Displays system configuration information
(unsupported)
• snap Collects system data
Figure 1-6. Before problems occur: A few good commands AN152.2
Notes:
A list of useful commands

It is important to maintain a control book of information about your system. This is
especially true when the problem involves not being able to access the system.
The list of commands on the visual provides a starting point for use in gathering key
information about your system.
There are also many other commands that can help you in gathering important system
information.
Sources of additional information

Be sure to check the man pages or the AIX Commands Reference for correct syntax and
option flags to be used with these commands to provide more specific information.
There is no man page or entry in the AIX Commands Reference for the bootinfo
command.
Instructor Guide
Instructor notes:
Purpose — Point out just a few of the commands that are helpful in learning about the
system and its configuration.
Details — Present the information in the student notes. Provide board work to show some
of the commands and the options that can be used with them.
Additional information — The bootinfo command is not officially supported; but, in case
students ask, here is some information regarding some of the most commonly used flags of
this command:
-r Displays real memory in KB
-p Displays hardware platform (rs6k, rspc, chrp)
-y Displays 32 if hardware is 32-bit or 64 if hardware is 64-bit
-K Displays 32 if kernel is 32-bit or 64 if kernel is 64-bit
-z Displays processor type (0=uniprocessor, 1=multiprocessor)
-s Displays size of disk (provided as an argument)
Transition statement — Let’s talk about what to do when things go wrong.

V7.0.1
Instructor Guide
Uempty
Steps in problem resolution

IBM Power Systems
1. Identify the
problem
2. Talk to users
to define the
problem
3. Collect
system data
4. Resolve the
problem
Figure 1-7. Steps in problem resolution AN152.2
Notes:
The start-to-finish method

The start-to-finish method for resolving problems consists primarily of the following four
major components:
- Identify the problem
- Talk to users to define the problem
- Collect system data
- Resolve (fix) the problem
Step 1: Identify the problem

The first step in problem resolution is to find out what the problem is. It is important to
understand exactly what the users of the system perceive the problem to be.
A clear description of the problem typically gives clues as to the cause of the problem
and aids in the choice of troubleshooting methods to apply.
Instructor Guide
Step 2: Gathering additional detail

A problem might be identified by just about anyone who has use of or a need to interact
with the system. If a problem is reported to you, it may be necessary to get details from
the reporting user and then query others on the system in order to obtain additional
details or to develop a clear picture of what happened.
The users may be data entry staff, programmers, system administrators, technical
support personnel, management, application developers, operations staff, network
users, and so forth.
Suggested questions
- What is the problem?
- What is the system doing (or not doing)?
- How did you first notice the problem?
- When did it happen?
- Have any changes been made recently?
Keep them talking until the picture is clear. Ask as many questions as you need to in
order to get the entire history of the problem.
Step 3: Collect system data

Some information about the system will have already been collected from the users
during the process of defining the problem.
By using various commands, such as lsdev, lspv, lsvg, lslpp, lsattr, and others,
you can gather further information about the system configuration.
You should also gather other relevant information by making use of available error
reporting facilities, determining the state of the operating system, checking for the
existence of a system dump, and inspecting the various available log files.
- How is the machine configured?
- What errors are being produced?
- What is the state of the OS?
- Is there a system dump?
- What log files exist?
SMIT logs
If SMIT has been used, there will be additional logs that could provide further
information. The SMIT log files are normally contained in the home directory of the root
user and is named smit.log, by default.

V7.0.1
Instructor Guide
Uempty Step 4: Resolve the problem

After all the information is gathered, determine the procedures necessary to solve the
problem. Keep a log of all actions you perform in trying to determine the cause of the
problem, and any actions you perform to correct the problem.
- Use the information gathered
- Keep a log of actions taken to correct the problem
- Use the tools available: commands documentation, downloadable fixes, and
updates
- Contact IBM Support, if necessary
Resources for problem solving

A variety of resources, such as the documentation for individual commands, are
available to assist you in solving problems with AIX systems.
The IBM Systems Information Center is a Web site that serves as a focal point for all
information pertaining to POWER servers and AIX. A message database is available to
search on error numbers, error identifiers, and display codes (LED values). The Web
site also contains FAQs, how-to’s and more.
Information Center
The IBM Power Systems Information Center can be found at the following link:
http://publib.boulder.ibm.com/eserver
Instructor Guide
Instructor notes:
Purpose — Provide the big picture for problem resolution.
Details —
Transition statement — An important part of the problem description is the collection of
generated messages or codes. Let’s look at some of the different types of codes and where
we can look them up.

V7.0.1
Instructor Guide
Uempty
Progress and reference codes

IBM Power Systems
Progress codes
Checkpoint during a process such as boot, shutdown, or dump
System reference codes (SRCs)

Error codes for problems in hardware, firmware, or operating system
Service request numbers (SRNs)

Indicates the detecting component and error condition detected
Obtained from:
Front panel of system enclosure
HMC or IVM (for logically partitioned systems)
Operator console message or diagnostics (diag utility)
Figure 1-8. Progress and reference codes AN152.2
Notes:
Introduction
AIX provides progress and error indicators (display codes) during the boot process.
These display codes can be very useful in resolving startup problems. Depending on
the hardware platform, the codes are displayed on the console and the operator panel.
Operator panel
For non-LPAR systems, the operator panel is an LED display on the front panel.
Beginning with the early POWER4 models, the POWER-based systems have had the
ability to be divided into multiple Logical Partitions (LPARs). In this case, a system-wide
LED display still exists on the front panel. However, the operator panel for each LPAR is
displayed on the screen of the Hardware Management Console (HMC). The HMC is a
separate system which is required when running multiple LPARs. Regardless of where
they are displayed, they are sometimes referred to as LED Display Codes.
Instructor Guide
Progress codes and other reference codes

Reference codes can have various sources:
- Diagnostics:
Diagnostics or error log analysis can provide Service Request Numbers (SRNs)
which can be used to determine the source of a hardware or operating system
problem.
- Hardware initialization:
System firmware sends boot status codes (called firmware checkpoints) to the
operator panel. Once the console is initialized, the firmware can also send 8-digit
error codes to the console.
- AIX initialization:
The rc.boot script and the device configuration methods send progress and error
codes to the operator panel.
Codes from the hardware/firmware or from AIX initialization scripts fall into two
categories:
- Progress Codes: These are checkpoints indicating the stages in the initial program
load (IPL) or boot sequence. They do not necessarily indicate a problem unless the
sequence permanently stops on a single code or a rotating sequence of codes.
- System Reference Codes (SRC): These are error codes indicating that a problem
has originated in hardware, Licensed Internal Code (firmware), or in the operating
system.

V7.0.1
Instructor Guide

Purpose — Introduce some places to find information to help diagnose boot problems.
Details —
Transition statement — Let us look at where these codes can be looked up.
Instructor Guide
Reference codes at the IBM Information Center

IBM Power Systems
Select Hardware Information Center > Systems Hardware
information
Figure 1-9. Reference codes at the IBM Information Center AN152.2
Notes:
Documentation
Note: all information on Web sites and their design is based upon what is available at
the time of this course revision. Web site URLs and the design of the related Web
pages often change.
Reference codes and their meanings can be found at:
http://publib.boulder.ibm.com/eserver under the particular server with which you are
working (though most codes are the same, regardless of the server model).

V7.0.1
Instructor Guide

Purpose — Identify how to look up reference codes.
Details —
Transition statement — If you have a problem and you think it is a defect in the product -
what do you do? Call IBM.
Instructor Guide
Working with AIX Support

IBM Power Systems
Have the needed information ready:

Name, phone #, customer #,
Machine type model and serial #,
AIX version, release, technology level, and service pack
Problem description, including error codes
Severity level: critical, significant impact, some impact, minimal
1-800-IBM-SERV (1-800-426-7378)
Level 1 will collect information and assign PMR number
Route to level 2 responsible for the product
You may be asked to collect additional information to upload
They may ask you to update to a specific TL or SP
APAR for your problem already addressed
Need to have a standard environment for them to investigate
Figure 1-10. Working with AIX Support AN152.2
Notes:
If you believe that your problem is the result of a system defect, you can call AIX
Support to request assistance. Before you call 1-800-IBM-SERV, it is a good idea to
have certain information ready. They will want to verify your name against a list of
names associated with your customer number, and validate that your customer number
has support for the product in question. They will also need to know some details about
the hardware and software environment in which the problem is occurring - such as
your MTMS (machine type, model, serial), your AIX OS level, and the level of any other
relevant software. Of course, you need to explain your problem, providing as much
detail as possible, especially any error messages or codes.
The Level 1 Support personnel will ask you for the priority of your problem.
- Severity level 1(critical) indicates that the function does not work, your business is
severely impacted, there is no work around, and that there needs to be an
immediate solution. Be aware that, for severity level 1, you will be expected to be
available 24x7 until the problem is resolved.

V7.0.1
Instructor Guide
Uempty - Severity level 2 (significant impact) indicates that the function is usable but is limited
in a way that your business is severely impacted.
- Severity level 3 (some impact) indicates that the program is usable with less
significant features (not critical to operations) unavailable.
- Severity level 4 (minimal impact) indicates that the problem causes little impact on
operations, or a reasonable circumvention to the problem has been implemented.
Level 1 Support will assign you a PMR number (actually a PMR and branch number
combination) for tracking purposes. In the future, each time you call about this problem,
you should have the PMR and branch numbers at hand.
Once the basic information has been collected, you are passed to Level 2 Support for
the product area for which you are having a problem. They will work with you in
investigating the nature and cause of your problem. They will search the support
database to see if it is a known problem that is either already being worked on or has a
solution already developed. In many cases, they will request that you update to a
specific technology level (TL) and service pack (SP) that already includes the fix.
If they do not have a fix, they may still ask you to update your system and determine if
the problem still exists. If the problem still exists, they now have a known software
environment to work with. At this point they will often ask for a complete set of
information from your system to be collected and uploaded to their server, to support
their investigation. The basic tool for collecting your system information is the snap
command.
Instructor Guide
Instructor notes:
Purpose — Introduce the procedure for working with AIX Support.
Details —
Transition statement — Let’s look at how we work with the snap command.

V7.0.1
Instructor Guide
Uempty
AIX Support test case data (1 of 2)

IBM Power Systems
Run the following (or very similar) commands to gather

snap information:
# snap –a
Copy any extra data to the /tmp/ibmsupt/testcase or the /tmp/ibmsupt/other directory
# snap –c This step will create /tmp/ibmsupt/snap.pax.Z

# cd /tmp/ibmsupt
# mv snap.pax.Z \ PMR#.b<branch#>.c<country#>.snap.pax.Z
Figure 1-11. AIX Support test case data (1 of 2) AN152.2
Notes:
Overview of the snap command

The snap command is used to gather system configuration information useful in
identifying and resolving system problems.
The snap command can also be used to compress the snap information gathered into a
pax file. The file may then be written to a device such as tape or DVD, or transmitted to
a remote system.
Refer to the man page for snap or the corresponding entry in the AIX Commands
Reference manual for detailed information about the snap command and its various
flags.
Instructor Guide
Discussion of command sequence shown on the visual

As illustrated on the visual, the -a flag of the snap command should be used to gather
all system configuration information that can be gathered using snap. The output of this
command will be written to the /tmp/ibmsupt directory.
Next, you should place any additional testcase data that you feel may be helpful in
resolving the problem into either the /tmp/ibmsupt/other or /tmp/ibmsupt/testcase
directory. This additional information is then included (together with the information
gathered directly by snap) in the compressed pax file created in the next step in this
command sequence.
As shown, the -c flag of the snap command should then be used to create a
compressed pax file containing all files contained in the /tmp/ibmsupt directory. The
output file created by this command is /tmp/ibmsupt/snap.pax.Z.
Next, the /tmp/ibmsupt/snap.pax.Z output file should be renamed using the mv
command to indicate the PMR number, branch number, and country number associated
with the data in the file. For example, if the PMR number is 12345, the branch number is
567, and the country number is 890, the file should be renamed
12345.b567.c890.snap.pax.Z. (The country code for the United States is: 000).

V7.0.1
Instructor Guide

Purpose — Cover how to create a snap file.
Details —
Transition statement — Once you have created a compressed snap file, you will need to
upload it to AIX Support.
Instructor Guide
AIX Support test case data (2 of 2)

IBM Power Systems
Upload the information you have captured:
# ftp testcase.software.ibm.com
User: anonymous
Password: <your email address>
ftp> cd /toibm/aix
ftp> bin
ftp> put PMR#.b<branch#>.c<country#>.snap.pax.Z
ftp> quit
Figure 1-12. AIX Support test case data (2 of 2) AN152.2
Notes:
Uploading data to AIX Support

AIX Support provides an anonymous FTP server for receiving your testcase data. The
host name for that server is: testcase.software.ibm.com.
Once you login to the server, change directory to /toibm/aix.
Be sure to transfer the file as binary to avoid an undesirable attempt by FTP to convert
the contents of the file.
Then, just put your file on the server and notify your support contact that the data is
there.

V7.0.1
Instructor Guide

Purpose — Cover how to upload the snap file.
Details —
Transition statement — AIX Support provides software fixes for the reported problems.
Your reported problem may already have an available fix. Let’s review the packaging and
levels of AIX operating system updates.
Instructor Guide
AIX software update hierarchy

IBM Power Systems
Version and release (oslevel)

Requires new license and migration install
Fileset updates (lslpp L will show mod and fix levels)
Collected changes to files in a fileset
Related to APARs and PTFs
Only need to apply the new fileset
Fix bundles
Collections of fileset updates
Technology level and maintenance level (oslevel r)
Fix bundle of enhancements and fixes
Service packs (oslevel s)
Fix bundle of important fixes
Interim fixes
Special situation code replacements
Delay for normal PTF packaging is too slow
Managed with efix tool
Figure 1-13. AIX software update hierarchy AN152.2
Notes:
Version, release, mod, and fix

The oslevel command, by default, displays the version and release of the AIX
operating system. Changing this requires a new license and a disruption to the system
(such as rebooting to installation and maintenance to do a migration install). The mod
and fix levels in the oslevel -s output are normally displayed as zeros. The mod
level displayed in the oslevel output should reflect the technology level.
The mod and fix levels are used to reflect changes to the many individual filesets which
make up the operating system. These are best seen by browsing through the output of
the lslpp -L command. These changes only require the administrator to install a
Program Temporary Fix (PTF) in the form of a fix fileset. A given fix fileset can resolve
one or more problems or APARs (Authorized Program Analysis Report).

V7.0.1
Instructor Guide
Uempty Fix bundles

It is useful to collect many accumulated PTFs together and test them together. This can
then be used as a base line for a new cycle of enhancements and corrections. By
testing them together, it is often possible to catch unexpected interactions between
them.
There are two types of AIX fix bundles.
- One type of fix bundle is a Technology Level (TL) update (formally known as
Maintenance Level or ML). This is a major fix bundle which not only includes many
fixes for code problems, but also includes minor functional enhancements. You can
identify the current AIX technology level by running the oslevel -r command.
- Another type of bundling is a Service Pack (SP). A Service Pack is released more
frequently than a Technology Level (between TL releases) and usually only contains
needed fixes. You can identify the current AIX technology level and service pack by
running the oslevel -s command.
For the oslevel command to reflect a new TL or SP, all related filesets fixes must be
installed. If a single fileset update in the fix bundle is not installed, the TL or SP level will
not change.
Interim fixes
On rare occasions, a customer has an urgent situation which needs fixes for a problem
so quickly that they cannot wait for the formal PTF to be released. In those situations, a
developer may place one or more individual file replacements on an FTP server and
allow the system administrator to download and install them. Originally, this would
simply involve manually copying the new files over the old files. But this created
problems, especially in identifying the state of a system which later experienced other
(possibly related) problems or in backing out the changes.
Today, there is a better methodology for managing these interim fixes using the efix
command. Security alerts will often provide interim fixes for the identified security
exposure. Depending upon your own risk analysis, you might immediately use the
interim fix, or wait for the next service pack (which will include these security fixes).
The syntax and use of the efix command was covered in the prerequisite course.
Instructor Guide
Instructor notes:
Purpose — Explain standard terminology for software updates
Details —
Transition statement — Let’s look at how we obtain these updates.

V7.0.1
Instructor Guide
Uempty
Relevant documentation
IBM Power Systems
IBM Systems Product Information entry page:

Including links to:

IBM Systems Hardware Information Center
AIX Information Center
IBM i and System i Information Center
IBM Information Center for Linux
IBM Storage Information Center
IBM Systems Directory Information Center
IBM Redbooks home:

http://www.redbooks.ibm.com
Figure 1-14. Relevant documentation AN152.2
Notes:
IBM Systems Hardware Information Center and AIX Information Center

Most software and hardware documentation for AIX and POWER-based systems can
be accessed online from the IBM Systems Product information entry page:
http://publib.boulder.ibm.com/eserver.
The two information centers most relevant to this course are the:
- IBM Systems Hardware Information Center
- AIX Information Center
These centers provide information both in on-line form and as downloadable pdf files.
IBM Redbooks
Redbooks can be viewed, downloaded, or ordered from the IBM Redbooks Web site:
http://www.redbooks.ibm.com
Instructor Guide
Instructor notes:
Purpose — Identify URLs for hardware and software documentation.
Details — Let students know that hard copy versions of the manuals can be ordered from
their IBM marketing representative.
Transition statement — Let’s review what we have covered with some checkpoint
questions.

V7.0.1
Instructor Guide
Uempty
Checkpoint
IBM Power Systems
1. What are the four major problem determination steps?
2. Who should provide information about system problems?
3. True or false: If there is a problem with the software, it is necessary

to get the next release of the product to resolve the problem.
4. True or false: Documentation can be viewed or downloaded from

the IBM Web site.
Figure 1-15. Checkpoint AN152.2
Notes:
Instructor Guide
Instructor notes:
Purpose — Discuss the first group of checkpoint questions.
Details — A checkpoint solution is provided below:
Checkpoint solutions
IBM Power Systems

The answer is identify the problem, talk to users (to further define the
problem), collect system data, and resolve the problem.

The answer is always talk to the users about such problems in order to
gather as much information as possible.

The answer is false: in most cases, it is only necessary to apply fixes or
upgrade microcode.
4. True or false: Documentation can be viewed or downloaded from the

IBM Web site.
The answer is true.
Transition statement — Let’s take a look at what we have in the class lab environment.

V7.0.1
Instructor Guide
Uempty
Exercise: Problem diagnostic information

IBM Power Systems
Ɣ Obtain configuration information about your

system
Ɣ Navigate the information center to find

reference code information
Ɣ Create, compress, and rename a snap file

for upload to AIX Support
Figure 1-16. Exercise: Problem diagnostic information AN152.2
Notes:
Instructor Guide
Instructor notes:
Purpose — Introduce the exercise for this unit.
Details —
Transition statement — Let’s summarize what we have covered in this unit.

V7.0.1
Instructor Guide
Uempty
Unit summary
IBM Power Systems
Having completed this unit, you should be able to:

List the steps of a basic methodology for problem
determination
List AIX features that assist in minimizing planned downtime
or shortening the maintenance window
Explain how to find documentation and other key resources
Figure 1-17. Unit summary AN152.2
Notes:
Instructor Guide
Instructor notes:
Purpose — Remind the students of some of the key points in this unit.
Details — Before continuing to the next unit stop and ask the students if there are any
additional questions before continuing.
Transition statement — That is the end of this unit.

V7.0.1
Instructor Guide
Uempty Unit 2. The Object Data Manager
Estimated time
01:10

This unit describes the structure of the Object Data Manager (ODM). It
shows the use of the ODM command line interface and explains the
role of the ODM in device configuration. Specific information regarding
the function and content of the most important ODM files is also
presented.

• Describe the structure of the ODM
• Use the ODM command line interface
• Explain the role of the ODM in device configuration
• Describe the function of the most important ODM files

Accountability:
• Lab exercise
References
Online AIX Version 7.1 Command Reference volumes 1-6
Online AIX Version 7.1 General Programming Concepts:
Writing and Debugging Programs
Online AIX Version 7.1 Technical Reference: Kernel and
Subsystems
Note: References listed as “online” above are available through the
IBM Systems Information Center at the following address:
© Copyright IBM Corp. 2009, 2012 Unit 2. The Object Data Manager 2-1
Instructor Guide
Unit objectives
IBM Power Systems

Describe the structure of the ODM
Use the ODM command line interface
Explain the role of the ODM in device configuration
Describe the function of the most important ODM files
Notes:
Importance of this unit

The ODM is a very important component of AIX and is one major feature that
distinguishes AIX from other UNIX systems. This unit describes the structure of the
ODM and explains how you can work with ODM files using the ODM command line
interface.
It is also very important that you, as an AIX system administrator, understand the role of
the ODM during device configuration. Thus, explaining the role of the ODM in this
process is another major objective of this unit.

V7.0.1
Instructor Guide

Purpose — Present the objectives of this unit.
Details — Explain that a good understanding of the ODM is very important and can help in
analyzing problems. Point out that the ODM is mainly used for device configuration and
that this is a major focus in this unit.
Additional information — None.
Transition statement — Let’s start with the basics, an introduction to the ODM.
Instructor Guide

V7.0.1
Instructor Guide
Uempty 2.1. Introduction to the ODM
Instructor topic introduction

What students will do — The students will learn the structure of the ODM and how they
can work with ODM files to query system data. Additionally, students will be able to explain
the role of the ODM in device configuration.
How students will do it — Through lecture and checkpoint questions.
What students will learn — Students will learn:
• How the ODM is used in AIX
• How the command line interface can be used to work with ODM in a safe way
• How devices are configured in AIX
How this will help students on their job — By having a good understanding of the ODM,
solving any system problem is much easier.
Instructor Guide
What is the ODM?

IBM Power Systems
The Object Data Manager (ODM) is a database intended for

storing system information
Physical and logical device information is stored and

maintained through the use of objects with associated
characteristics
Figure 2-2. What is the ODM? AN152.2
Notes:

V7.0.1
Instructor Guide

Purpose — Introduce the Object Data Manager (ODM).
Details — This visual has been intentionally kept simple. The goal here is to introduce the
ODM; details will come later.
Transition statement — What kind of information is managed by the ODM?
Instructor Guide
Data managed by the ODM

IBM Power Systems
Devices Software
System
SMIT menus
resource ODM
and panels
controller
TCP/IP Error Log,

NIM
configuration Dump
Figure 2-3. Data managed by the ODM AN152.2
Notes:
System data managed by ODM

The ODM manages the following system data:
- *Device configuration data
- *Software Vital Product Data (SWVPD)
- System Resource Controller (SRC) data
- TCP/IP configuration data
- Error log and dump information
- NIM (Network Installation Manager) information
- SMIT menus and commands

V7.0.1
Instructor Guide
Uempty Emphasis in this unit

Our main emphasis in this unit is on the use of ODM to store and manage information
regarding devices and software products (software vital product data). During the
course, many other ODM classes are described.
Instructor Guide
Instructor notes:
Purpose — Provide an overview of what data is stored in the ODM.
Details — Go quickly through the list and mention that the main emphasis in this unit is on
devices and software vital product data. You might want to point out that the two “hands” on
the visual “point” to the types of data that will be emphasized. Later on, you supply the
corresponding ODM database files where the data is stored.
Additional information — You might mention that TCP/IP configuration can still be set up
without using ODM. In this case, traditional ASCII files are used for storing TCP/IP data. To
determine whether ODM is used for TCP/IP, use the following command:
# lsattr -El inet0
If the attribute bootup_option is set to no, ODM files are used. If it is set to yes, ODM will
not be used.
Transition statement — Let’s define some key terminology we will need for our discussion
of the ODM.

V7.0.1
Instructor Guide
Uempty
ODM components
IBM Power Systems
uniquetype attribute deflt values
tape/scsi/scsd block_size none 0-2147483648,1
disk/scsi/osdisk pvid none
tty/rs232/tty login disable enable, disable, ...
Figure 2-4. ODM components AN152.2
Notes:
Completing the drawing on the visual

The drawing on the visual above identifies the basic components of ODM, but some
terms have been intentionally omitted from the drawing. Your instructor will complete
this drawing during the lecture. Please complete your own copy of the drawing by
writing in the terms supplied by your instructor.
ODM data format

For security reasons, the ODM data is stored in binary format. To work with ODM files,
you must use the ODM command line interface. It is not possible to update ODM files
with an editor.
Instructor Guide
Instructor notes:
Purpose — Define the basic components of ODM.
Details — Complete the visual during the lesson. ODM components are:
• Object classes
The ODM consists of many database files, where each file is called an object class.
• Objects
Each object class consists of objects. Each object is one record in an object class.
• Descriptors
The descriptors describe the layout of the objects. They determine the name and
datatype of the fields that are part of the object class.
Additional information — This visual shows an extraction out of the ODM class PdAt. Do
not explain the meaning of PdAt or the different fields on this page. Concentrate on the
components of the ODM.
Transition statement — It is also important to understand how the terms predefined
device information and customized device information are used when discussing the ODM.

V7.0.1
Instructor Guide
Uempty
ODM database files

IBM Power Systems
Predefined device information PdDv, PdAt, PdCn

Customized device information CuDv, CuAt, CuDep, CuDvDr,
CuVPD, Config_Rules
Software vital product data history, inventory, lpp, product
SMIT menus sm_menu_opt, sm_name_hdr,
sm_cmd_hdr, sm_cmd_opt
Error log, alog, and dump SWservAt
information
System resource controller SRCsubsys, SRCsubsvr, ...
Network Installation Manager nim_attr, nim_object,
(NIM) nim_pdattr
Figure 2-5. ODM database files AN152.2
Notes:
Major ODM files

The table in the visual summarizes the major ODM files in AIX. As you can see, the files
listed in this table are placed into several different categories.
Current focus
In this unit, we will concentrate on ODM classes that are used to store device
information and software product data. At this point, we will narrow our focus even
further and confine our discussion to ODM classes that store device information.
Instructor Guide
Predefined and customized device information

The first two rows in the table on the visual indicate that some ODM classes contain
predefined device information and that others contain customized device information.
What is the difference between these two types of information?
Predefined device information describes all supported devices. Customized device
information describes all devices that are defined on the system.
It is very important that you understand the difference between these two information
classifications.
The classes themselves are described in more detail in the next topic of this unit.

V7.0.1
Instructor Guide

Purpose — Explain the difference between predefined and customized device information.
Details — Do not introduce the other ODM classes on this visual. At this point, just provide
the difference between Pd and Cu classes.
Note:
In the activity at the end of this topic, students have to answer the following questions:
What ODM class contains all supported devices on your system?
What ODM class contains all configured devices on your system?
Therefore, describe clearly the meaning of PdDv and CuDv at this point.
Transition statement — The next visual shows just the ODM object classes used during
the configuration of a device. It also introduces cfgmgr, the “configuration manager.”
Instructor Guide
Device configuration summary

IBM Power Systems
Predefined databases
PdDv
PdCn PdAt
Configuration Manager
Config_Rules
(cfgmgr)
Customized databases
CuDep CuDv CuAt
CuDvDr CuVPD
Figure 2-6. Device configuration summary AN152.2
Notes:
ODM classes used during device configuration

The visual above shows the ODM object classes used during the configuration of a
device.
Roles of cfgmgr and the Config_Rules ODM object class

When an AIX system boots, the Configuration Manager (cfgmgr) is responsible for
configuring devices. There is one ODM object class which the cfgmgr uses to
determine the correct sequence when configuring devices: Config_Rules. This ODM
object class also contains information about various method files used for device
management.

V7.0.1
Instructor Guide

Purpose — Summarize the device configuration ODM object classes.
Details — Review the ODM object classes belonging to the predefined and customized
databases.
The role of the Config_Rules object class is covered here and on the next visual (and the
associated student notes), but we will provide more detail about each of the other object
classes shown later.
Transition statement — Let’s look at the device configuration process a little more closely.
Instructor Guide
Configuration manager
IBM Power Systems
Predefined "Plug and Play"

PdDv
PdAt
PdCn
Config_Rules
cfgmgr
Customized Methods
CuDv Define
Device Load
CuAt Configure
Driver
CuDep Change
Unload
CuDvDr Unconfigure
CuVPD Undefine
Figure 2-7. Configuration manager AN152.2
Notes:
Importance of Config_Rules object class

Although cfgmgr gets credit for managing devices (adding, deleting, changing, and so
forth), it is actually the programs, called methods, which are defined in the predefined
devices object class that do the actual work. The Config_Rules object class defines
the order in which cfgmgr examines various busses looking for attached devices to
configure.

V7.0.1
Instructor Guide

Purpose — Describe the operation of cfgmgr and its interaction with the ODM.
Details — Explain how the “plug and play” gets added.
Additional information — Try entering the command odmget Config_Rules to find out
more about the content of this object class. Note the frequent references to the directories
/etc/methods and /usr/lib/methods. Although we have not discussed the odmget
command yet, you could use the command odmget Config_Rules (as a sort of preview)
and point out the references to the two directories as a demo.
Transition statement — The ODM object classes are stored in three repositories.
Instructor Guide
Location and contents of ODM repositories

IBM Power Systems
CuDv
Constant for machines of same architecture
CuAt
CuDep Constant for all machines
CuDvDr
CuVPD
Config_Rules PdDv
PdAt
history PdCn
inventory
lpp history
product inventory
lpp history
nim_* product inventory
SWservAt lpp
SRC* sm_* product
/etc/objrepos /usr/lib/objrepos /usr/share/lib/objrepos
Figure 2-8. Location and contents of ODM repositories AN152.2
Notes:
Introduction
Originally, the three parts of the ODM were designed to support diskless, dataless and
other workstations. The ODM object classes are held in three repositories. Each of
these repositories is described in the material that follows.
/etc/objrepos
The purpose of this location is to hold information that is expected to vary from machine
to machine and can not be shared with other machines. It contains the part of the
product that cannot be shared among machines. Each client must have its own copy.
Most of this software requiring a separate copy for each machine is associated with the
configuration of the machine or product.
One example is the customized device information. For example, the location of a
device or the overrides to the default attributes can be expected to vary.

V7.0.1
Instructor Guide
Uempty This repository contains the customized devices object classes and the four object
classes used by the Software Vital Product Database (SWVPD) for the / (root) part of
the installable software product. The root part of the software contains files that must
be installed on the target system. For example, any configuration files used by the
programs would be in the root part.
To access information in the other directories, this directory contains symbolic links to
the predefined devices object classes. The links are needed because the ODMDIR
variable points to only /etc/objrepos.
/usr/lib/objrepos
This repository contains the predefined devices object classes, SMIT menu object
classes, and the four object classes used by the SWVPD for the /usr part of the
installable software product. The object classes in this repository can be shared across
the network by /usr clients, dataless and diskless workstations. Software installed in the
/usr part can be can be shared among several machines with compatible hardware
architectures.
/usr/share/lib/objrepos
Contains the four object classes used by the SWVPD for the /usr/share part of the
installable software product. The /usr/share part of a software product contains files
that are not hardware dependent. They can be shared among several machines, even if
the machines have a different hardware architecture. An example of this are terminfo
files that describe terminal capabilities. As terminfo is used on many UNIX systems,
terminfo files are part of the /usr/share part of a system product.
lslpp options
The lslpp command can list the software recorded in the ODM. When run with the -l
(lower case L) flag, it lists each of the locations (/, /usr/lib, /usr/share/lib) where it finds
the fileset recorded. This can be distracting if you are not concerned with these
distinctions. Alternately, you can run lslpp -L which only reports each fileset once,
without making distinctions between the root, usr, and share portions.
terminfo files are part of the /usr/share part of a system product.
When should you be concerned about private versus shared ODM

repositories?
Most of you do not deal with diskless and dataless servers, so the distinctions for the
device objects will generally not concern you; every machine will have all three
repositories local for those objects.
If you are working with Workload Partitions, then the different software object class
repositories are a major concern. A workload partition would only have the private or
Instructor Guide
root portion of the software in its private file systems. The the other object repositories
for the software would be maintained in global environment file systems, which would
be shared among all WPARs, using read-only mounts. For details, attend the course
that teaches AIX Workload Partitions.

V7.0.1
Instructor Guide

Purpose — Describe the different directories that hold ODM data.
Details — Describe what ODM files reside in /etc/objrepos, /usr/lib/objrepos and
/usr/share/lib/objrepos.
Explain the meaning of the root, /usr and /usr/share part of a software product and
identify that /usr/lib/objrepos and /usr/share/lib/objrepos can be shared in a network.
Transition statement — It is important to understand how ODM classes interact.
Instructor Guide
How ODM classes act together

IBM Power Systems
# cfgmgr
PdDv: CuDv:
type = "14106902" name = "ent1"
class = "adapter" status = 1
subclass = "pci" chgstatus = 2
prefix = "ent" ddins = "pci/goentdd"
... location = "02-08"
DvDr = "pci/goentdd" parent = "pci2"
Define = /usr/lib/methods/define_rspc" connwhere = "8“
Configure = "/usr/lib/methods/cfggoent"
... PdDvLn = "adapter/pci/14106902"
uniquetype = "adapter/pci/14106902"
# chdev -l ent1 -a jumbo_frames=yes
PdAt: CuAt:
uniquetype = "adapter/pci/14106902" name = "ent1"
attribute = "jumbo_frames" attribute = "jumbo_frames"
deflt = "no" value = "yes"
values = "yes,no" type = "R"
Figure 2-9. How ODM classes act together AN152.2
Notes:
Interaction of ODM classes

The visual above and the notes below summarize how ODM classes act together.
- In order for a particular device to be defined in AIX, the device type must be defined
in ODM class PdDv.
- A device can be defined by either the cfgmgr (if the device is detectable), or by the
mkdev command. Both commands use the define method to generate an instance in
ODM class CuDv. The configure method is used to load a specific device driver and
to generate an entry in the /dev directory.
Notice the link PdDvLn from CuDv back to PdDv.
- At this point you only have default attribute values in PdAt which, in our example of
a gigabit Ethernet adapter, means you could not use jumbo frames (default is no). If
you change the attributes, for example, jumbo_frames to yes, you get an object
describing the nondefault value in CuAt.

V7.0.1
Instructor Guide

Purpose — Summarize how the basic ODM classes interact.
Details — Explain the flow as described in student notes.
Transition statement — As you know, not all system data is managed by the ODM.
Instructor Guide
Data not managed by the ODM

IBM Power Systems
File system
information ?
User/security
information ?
Queues and
queue devices ?
Figure 2-10. Data not managed by the ODM AN152.2
Notes:
Completion of this page

The visual above identifies some types of system information that are not managed by
the ODM, but the names of the files that store these types of information have been
intentionally omitted from the visual. Your instructor will complete this visual during the
lecture. Please complete your own copy of the visual by writing in the file names
supplied by your instructor.

V7.0.1
Instructor Guide

Purpose — Review some files from the basic administration course.
Details — Ask the students the following questions:
1. Which file contains information about the file systems on your system?
/etc/filesystems
2. Which file contains most of the basic information (such as home directory and shell)
about the users on your system?
/etc/passwd
Which file contains user attributes like password rules?
/etc/security/user
3. Where is information about your queues and queue devices stored?
/etc/qconfig
Be sure to fill in the appropriate line on the visual as you give the answer to each question.
Additional information — Tell the students that this is only a subset of data that is not in
ODM.
Transition statement — Let’s review some of the points we have covered so far in this
unit.
Instructor Guide
Lets review: Device configuration and the ODM

IBM Power Systems
1.
_______
Undefined Defined Available
2. 3.
AIX kernel Applications
D____ D____ 4. /____/_____ 5.
Figure 2-11. Let’s review: Device configuration and the ODM AN152.2
Notes:
Instructions
Please answer the following questions by writing them on the picture above. If you are
unsure about a question, leave it out.
1. Which command configures devices in an AIX system? Note: This is not an ODM
command.
2. Which ODM class contains all devices that your system supports?
3. Which ODM class contains all devices that are configured in your system?
4. Which programs are loaded into the AIX kernel to control access to the devices?
5. If you have a configured tape drive rmt1, which special file do applications access to
work with this device?

V7.0.1
Instructor Guide

Purpose — Provide information about what happens when a device is configured in AIX.
Details — Give the students five minutes to answer the questions. Then, provide the
following answers:
1. cfgmgr
2. PdDv
3. CuDv
4. Device driver
5. /dev/rmt1
Additional information — Summarize the picture after the discussion:
If a device is to be configured, it must have an entry in the PdDv object class. It is not
possible to configure a device that does not have an entry in PdDv.
If a device is in the defined state, you definitely have an object in ODM class CuDv. The
difference between the defined state and the available state is that, in the defined state, no
device driver has been loaded into the AIX kernel. In other words, the program that controls
the device does not exist in the defined state.
When a device is made available, the device driver is loaded into the kernel. Additionally, a
special file is created in the /dev directory that applications need to access the device.
All this is done dynamically without a need to recompile the AIX kernel (which historically
had to be done on other UNIX systems). Historically, this has been one big advantage of
AIX against other UNIX systems.
Transition statement — Now, let’s look at some commands used to work with the ODM.
Instructor Guide
ODM commands
IBM Power Systems
Object class: odmcreate, odmdrop
Descriptors: odmshow
uniquetype attribute deflt values
tape/scsi/scsd block_size none 0-2147483648,1
disk/scsi/osdisk pvid none
tty/rs232/tty login disable enable, disable, ...
Objects: odmadd, odmchange, odmdelete, odmget

Figure 2-12. ODM commands AN152.2
Notes:
Introduction
Different commands are available for working with each of the ODM components:
object classes, descriptors, and objects.
Commands for working with ODM object classes

- Creating object classes
You can create ODM classes using the odmcreate command. This command
has the following syntax:
odmcreate descriptor_file.cre
The file descriptor_file.cre contains the class definition for the corresponding
ODM class. Usually, these files have the suffix .cre. The exercise for this unit
contains an optional part that shows how to create self-defined ODM object
classes.

V7.0.1
Instructor Guide
Uempty - Deleting object classes

To delete an entire ODM class, use the odmdrop command. The odmdrop
command has the following syntax:
odmdrop -o object_class_name
The name object_class_name is the name of the ODM class you want to
remove. Be very careful with this command. It removes the complete class
immediately.
Command to view ODM descriptors

To view the underlying structure of an object class, use the odmshow command. The
odmshow command has the following syntax:
odmshow object_class_name
The visual shows an extraction from ODM class PdAt, where four descriptors are
shown (uniquetype, attribute, deflt, and values).
Commands for working with ODM objects

Usually, system administrators work with ODM objects. The odmget command retrieves
object information from an existing object class. To add new objects, use the odmadd
command. To delete objects, use the odmdelete command. To change objects, use the
odmchange command. Working on the object level is explained in more detail on the
following pages.
The ODMDIR environment variable

All ODM commands use the ODMDIR environment variable, which is set in the file
/etc/environment. The default value of ODMDIR is /etc/objrepos.
Instructor Guide
Instructor notes:
Purpose — Introduce the ODM command line interface.
Details — Explain briefly the different ODM commands. Introduce the ODMDIR variable that
is used for all ODM commands.
Additional information — Tell the students that for system developers, an ODM API is
available.
Transition statement — The commands for working with ODM objects are the commands
system administrators use most often, so let’s spend a little more time talking about how
these commands work.

V7.0.1
Instructor Guide
Uempty
Changing attribute values

IBM Power Systems
# odmget –q "uniquetype=tape/scsi/scsd and attribute=block_size" PdAt > file
# vi file
PdAt:
uniquetype = "tape/scsi/scsd"
attribute = "block_size"
deflt = “512" Modify deflt to 512
values = "0-2147483648,1"
width = ""
type = "R"
generic = "DU"
rep = "nr"
nls_index = 6
# odmdelete -o PdAt –q "uniquetype=tape/scsi/scsd and attribute=block_size"
# odmadd file
Figure 2-13. Changing attribute values AN152.2
Notes:
Discussion of command sequence on the visual

The odmget command in the example on the visual will retrieve all the records from the
PdAt object class, where uniquetype is equal to tape/scsi/scsd and attribute is
equal to block_size. In this instance, only one record should be matched. The
information is redirected into a file which can be changed using an editor.
In this example, the default value for the attribute block_size is changed to 512.
Note: Before the new value of 512 can be added into the ODM, the old object (which
had the block_size set to a null value) must be deleted, otherwise you would end up
with two objects describing the same attribute in the database. The first object found will
be used, and the results could be quite confusing. This is why it is important to delete an
entry before adding a replacement record.
The final operation is to add the object defined in the file into the ODM.
Instructor Guide
Need to use ODM commands

The ODM objects are stored in a binary format; that means you need to work with the
ODM commands to query or change any objects.
Possible queries
As with any database, you can perform queries for records matching certain criteria.
The tests are on the values of the descriptors of the objects. A number of tests can be
performed:
= Equal
!= Not equal
> Greater
>= Greater than or equal to
< Less than
<= Less than or equal to
like Similar to; finds patterns in character string data
For example, to search for records where the value of the lpp_name attribute begins
with bosext1., you would use the syntax lpp_name like bosext1.*
Tests can be linked together using normal boolean operations, as shown in the
following example:
uniquetype=tape/scsi/scsd and attribute=block_size
In addition to the * wildcard, a ? can be used as a wildcard character.

V7.0.1
Instructor Guide

Purpose — Describe some ODM commands used to manage objects.
Details — Go through the example to show how the commands are used. Ensure that the
students understand the purpose of each command. For example, state that the odmget
command is used to retrieve a specific object from an object class to either view it or
change it in some way.
The following example can be used to illustrate the use of the odmget, odmadd and
odmdelete commands:
Assume that you are manipulating the PdAt object class, which has an entry for the 8 mm
tape drive with attribute block_size set to 1024. Assume that you wish to modify the
default block_size value to 512.
The odmget command extracts the block_size record into a file.
Note that, if there is more than one entry matching the pattern, then information regarding
each will be retrieved.
Having obtained the record, use vi (or your favorite editor) to edit that record and overtype
the new number.
Ask the students what potential problem they would encounter if they issued the odmadd
command at this point. They would have duplicate instances of the block_size attribute,
as the original record would not be overwritten. To overcome this problem, issue the
odmdelete command before the odmadd command. If there are duplicate objects, only the
first is recognized.
If there are multiple records matching the search pattern, the odmdelete command will
delete all of them. So, be specific with your search.
Now you can issue the odmadd command.
Notice that with this command, all you specify is the temporary file where the new
information is held. You do not specify the object class name where the record is to go. Ask
the students how the odmadd command knows which object class this entry is to go to. The
answer is that saved in the file is the name of the object class from where the record was
obtained. The stanza labels in the input file will contain this information, in this case PdAt.
Additional information — None
Transition statement — Let’s look at another way of carrying out the above set of steps.
Instructor Guide
Using odmchange to change attribute values

IBM Power Systems
# odmget –q "uniquetype=tape/scsi/scsd and attribute=block_size" PdAt > file
# vi file
PdAt:
deflt = "512" Modify deflt to 512
values = "0-2147483648,1"
width = ""
type = "R"
generic = "DU"
rep = "nr"
nls_index = 6
# odmchange -o PdAt –q "uniquetype=tape/scsi/scsd and attribute=block_size" file
Figure 2-14. Using odmchange to change attribute values AN152.2
Notes:
Another way of changing attribute values

The series of steps shown on this visual shows how the odmchange command can be
used instead of the odmadd and odmdelete steps shown in the previous example to
modify attribute values.

V7.0.1
Instructor Guide

Purpose — Define how the odmchange command can be used instead of the odmadd and
odmdelete commands.
Details — Novice users should be encouraged to use odmdelete and odmadd commands
rather than the odmchange command, which does the delete and the add operations all in
one step. This is because with the odmchange command, you have to be very careful about
the possibility of additional entries with the same field as the one you are using for
searching, as you might end up changing more than you anticipated.
Transition statement — Now, let’s look at some of the key ODM classes in more detail.
Instructor Guide

V7.0.1
Instructor Guide
Uempty 2.2. ODM database files

What students will do — Students will learn details regarding the function and layout of
those ODM classes that were introduced in topic 1. By examining these classes, we:
• Review the role of the ODM in device configuration.
• Introduce the software vital product database and explain what state information
students should know about.
How students will do it — Through lecture, lab exercise, and checkpoint questions
What students will learn — Students will be able to:
• Discuss the function and layout of those ODM classes which are part of the Software
Vital Product Database and those which are used during the configuration of a device.
• Explain how ODM classes can be used to analyze system problems.
How this will help students on their job — Many ODM-related problems are much easier
to fix if one knows the key ODM classes and descriptors.
Instructor Guide
Software vital product data

IBM Power Systems
product:
product:
lpp_name
lpp_name == "bos.rte.printers"
"bos.rte.printers" inventory:
comp_id inventory:
comp_id == "5765-G6200"
"5765-G6200" lpp_id
lpp_id == 38
38
update
update == 00 private
cp_flag private == 00
cp_flag == 2359571
2359571 file_type
file_type == 00
fesn
fesn == "0000"
"0000" format
name = "bos" format == 11
name = "bos" loc0
loc0 == "/etc/qconfig"
"/etc/qconfig"
state
state == 55 loc1
ver loc1 == ""
""
ver == 77 loc2 = ""
loc2 = ""
rel
rel == 11 size = 0
mod size = 0
mod == 00 checksum
checksum == 00
fix
fix == 00 ...
ptf ...
ptf == ""
""
media
media == 00
sceded_by
sceded_by == """"
fixinfo
fixinfo == """"
prereq
prereq = "*coreq bos.rte
= "*coreq bos.rte 7.1.0.0"
7.1.0.0" history:
history:
description
description == """" lpp_id
lpp_id == 3838
supersedes
supersedes == """" event
event == 22
ver
ver == 77
lpp:
lpp: rel
rel == 11
name
name == "bos.rte.printers"
"bos.rte.printers" mod
mod == 00
size
size == 00 fix
fix == 00
state
state = 55
= ptf
ptf == ""
""
cp_flag
cp_flag == 2359571
2359571 corr_svn
corr_svn == """"
group
group == ""
"" cp_mod
cp_mod == """"
magic_letter
magic_letter == "I"
"I" cp_fix =
cp_fix = """"
ver
ver == 77 login_name
login_name == "root"
"root"
rel
rel == 11 state
state == 11
mod
mod == 00 time
time == 1310159341
1310159341
fix
fix == 00 comment
comment == """"
description
description == "Front
"Front End
End Printer
Printer Support"
Support"
lpp_id
lpp_id == 38
38
Figure 2-15. Software vital product data AN152.2
Notes:
Role of the installp command

Whenever installing a product or update in AIX, the installp command uses the ODM
to maintain the Software Vital Product Database (SWVPD).

V7.0.1
Instructor Guide
Uempty Contents of SWVPD

The following information is part of the SWVPD:
• The name of the software product (for example, bos.rte.printers)
• The version, release, modification, and fix level of the software product (for example,
6.1.5.2 or 7.1.0.0)
• The fix level, which contains a summary of fixes implemented in a product
• Any program temporary fix (PTF) that has been installed on the system
• The state of the software product:
- Available (state = 1)
- Applying (state = 2)
- Applied (state = 3)
- Committing (state = 4)
- Committed (state = 5)
- Rejecting (state = 6)
- Broken (state = 7)
SWVPD classes
The Software Vital Product Data is stored in the following ODM classes:
lpp The lpp object class contains information about the installed
software products, including the current software product state
and description.
inventory The inventory object class contains information about the files
associated with a software product.
product The product object class contains product information about
the installation and updates of software products and their
prerequisites.
history The history object class contains historical information about
the installation and updates of software products.
Instructor Guide
Instructor notes:
Purpose — Introduce the software vital product database.
Details — Explain what kind of data is stored in the ODM classes (version, release, and so
forth) and the meaning of the shown ODM classes. Identify how the classes are linked
together by the lpp_id descriptor. Note that the list of descriptors is not complete and that
the slide only lists selected descriptors for teaching purposes.
Additional information — At this point, you might introduce the lslpp command, which
has options like -l, -h, -f and -w. This command queries the software vital product
database. We can see most of this information with the high-level lslpp command. The
flags (and the related object classes) are:
-L : Lists the filesets (lpp object class)
-d : Lists the fileset dependencies (product object class)
-p : Lists the fileset prerequisites (product object class)
-w : Lists the fileset for a given file (inventory object class)
-f : Lists the files for a given fileset (inventory object class)
-h : Lists the maintenance history for a fileset (history object class)
The commands used to produce the output on the visual are:
• lpp:
# odmget -q name=bos.rte.printers lpp
• product:
# odmget -q lpp_name=bos.rte.printers product
• inventory:
# odmget -q lpp_id=38 inventory | pg
Since there are a number of files in the root file system for this fileset, there are a
number of objects that match this query (hence the pg command). Note that there are
also files in this fileset in the /usr file system.
To display these: ODMDIR=/usr/lib/objrepos, then rerun the last odmget command.
(Note: ODMDIR defaults to /etc/objrepos.)
• history:
# odmget -q lpp_id=38 history
Transition statement — Let’s introduce the most important software states.

V7.0.1
Instructor Guide
Uempty
Software states you should know about

IBM Power Systems
Only possible for PTFs or Updates

Applied Previous version stored in /usr/lpp/Package_Name
Rejecting update recovers to saved version
Committing update deletes previous version
Committed Removing committed software is possible

No return to previous version
Applying
Committing If installation was not successful:
a) installp -C
Rejecting b) smit maintain_software
Deinstalling
Cleanup failed
Broken
Remove software and reinstall
Figure 2-16. Software states you should know about AN152.2
Notes:
Introduction
The AIX software vital product database uses software states that describe the status of
an install or update package.
The applied and committed states

When installing a program temporary fix (PTF) or update package, you can install the
software into an applied state. Software in an applied state contains the newly installed
version (which is active) and a backup of the old version (which is inactive). This gives
you the opportunity to test the new software. If it works as expected, you can commit
the software, which will remove the old version. If it does not work as planned, you can
reject the software, which will remove the new software and reactivate the old version.
Install packages cannot be applied. These will always be committed.
Instructor Guide
Once a product is committed, if you would like to return to the old version, you must
remove the current version and reinstall the old version.
States indicating installation problems

If an installation does not complete successfully, for example, if the power fails during
the install, you may find software states like applying, committing, rejecting, or
deinstalling. To recover from this failure, execute the command installp -C or use the
SMIT fastpath smit maintain_software. Select Clean Up After Failed or
Interrupted Installation when working in SMIT.
The broken state

After a cleanup of a failed installation, you might detect a broken software status. In this
case, the only way to recover from the failure is to remove and reinstall the software
package.

V7.0.1
Instructor Guide

Purpose — Introduce the most important software states.
Details — Explain the states using the information given in the student notes.
Transition statement — Let’s explain ODM class PdDv.
Instructor Guide
Predefined devices
IBM Power Systems
PdDv:
type = "scsd"
class = "tape"
subclass = "scsi"
prefix = "rmt"
...
base = 0
...
detectable = 1
...
led = 2418
setno = 54
msgno = 0
catalog = "devices.cat"
DvDr = "tape"
Define = "/etc/methods/define"
Configure = "/etc/methods/cfgsctape"
Change = "/etc/methods/chggen"
Unconfigure = "/etc/methods/ucfgdevice"
Undefine = "etc/methods/undefine"
Start = ""
Stop = ""
...
Figure 2-17. Predefined devices AN152.2
Notes:
The predefined devices (PdDv) object class

The Predefined Devices (PdDv) object class contains entries for all devices supported
by the system. A device that is not part of this ODM class cannot be configured on an
AIX system. Key attributes of objects in this class are described in the following
paragraphs.
type
This specifies the product name or model number, for example, 8 mm (tape).
class
Specifies the functional class name. A functional class is a group of device instances
sharing the same high-level function. For example, tape is a functional class name
representing all tape devices.

V7.0.1
Instructor Guide
Uempty subclass
Device classes are grouped into subclasses. The subclass scsi specifies all tape
devices that may be attached to a SCSI interface.
prefix
This specifies the Assigned Prefix in the customized database, which is used to derive
the device instance name and /dev name. For example, rmt is the prefix name
assigned to tape devices. Names of tape devices would then look like rmt0, rmt1, or
rmt2.
base
This descriptor specifies whether a device is a base device or not. A base device is any
device that forms part of a minimal base system. During system boot, a minimal base
system is configured to permit access to the root volume group (rootvg) and hence to
the root file system. This minimal base system can include, for example, the standard
I/O diskette adapter and a SCSI hard drive. The device shown on the visual is not a
base device.
This flag is also used by the bosboot and savebase commands, which are introduced
later in this course.
detectable
This specifies whether the device instance is detectable or undetectable. A device
whose presence and type can be determined by the cfgmgr, once it is actually powered
on and attached to the system, is said to be detectable. A value of 1 means that the
device is detectable, and a value of 0 that it is not (for example, a printer or tty).
led
This indicates the value displayed on the LEDs when the configure method begins to
run. The value stored is decimal, but the value shown on the LEDs is hexadecimal
(2418 is 972 in hex).
setno, msgno
Each device has a specific description (for example, SCSI Tape Drive) that is shown
when the device attributes are listed by the lsdev command. These two descriptors are
used to look up the description in a message catalog.
Instructor Guide
catalog
This identifies the filename of the national language support (NLS) catalog. The LANG
variable on a system controls which catalog file is used to show a message. For
example, if LANG is set to en_US, the catalog file /usr/lib/nls/msg/en_US/devices.cat is
used. If LANG is de_DE, catalog /usr/lib/nls/msg/de_DE/devices.cat is used.
DvDr
This identifies the name of the device driver associated with the device (for example,
tape). Usually, device drivers are stored in directory /usr/lib/drivers. Device drivers are
loaded into the AIX kernel when a device is made available.
Define
This names the define method associated with the device type. This program is called
when a device is brought into the defined state.
Configure
This names the configure method associated with the device type. This program is
called when a device is brought into the available state.
Change
This names the change method associated with the device type. This program is called
when a device attribute is changed through the chdev command.
Unconfigure
This names the unconfigure method associated with the device type. This program is
called when a device is unconfigured by rmdev -l.
Undefine
This names the undefine method associated with the device type. This program is
called when a device is undefined by rmdev -l -d.
Start, stop
Few devices support a stopped state (only logical devices). A stopped state means that
the device driver is loaded, but no application can access the device. These two
attributes name the methods to start or stop a device.

V7.0.1
Instructor Guide
Uempty uniquetype
This is a key that is referenced by other object classes. Objects use this descriptor as a
pointer back to the device description in PdDv. The key is a concatenation of the class,
subclass, and type values.
Instructor Guide
Instructor notes:
Purpose — Introduce object class PdDv.
Details — Explain the different descriptors.
Additional information — If you want, you can mention there is an additional method for
starting and stopping a device. To stop a device issue the following command:
# rmdev -l <device_name> -S
Be happy if you found a device that supports the stopped state. Remember physical
devices do not support a stopped state.
You can list the devices in the Predefined Devices object class using the following
command:
# lsdev -P
Transition statement — Next class is PdAt.

V7.0.1
Instructor Guide
Uempty
Predefined attributes
IBM Power Systems
PdAt:
deflt = ""
values = "0-2147483648,1"
...
PdAt:
uniquetype = "disk/scsi/osdisk"
attribute = "pvid"
deflt = "none"
values = ""
...
PdAt:
uniquetype = "tty/rs232/tty"
attribute = "term"
deflt = "dumb"
values = ""
...
Figure 2-18. Predefined attributes AN152.2
Notes:
The predefined attribute (PdAt) object class

The Predefined Attribute (PdAt) object class contains an entry for each existing
attribute for each device represented in the PdDv object class. An attribute is any
device-dependent information, such as interrupt levels, bus I/O address ranges, baud
rates, parity settings, or block sizes.
The extract out of PdAt that is given on the visual shows three attributes (block_size,
pvid (physical volume identifier), and term (terminal name)) and their default values.
The meanings of the key fields shown on the visual are described in the paragraphs
that follow.
uniquetype
This descriptor is used as a pointer back to the device defined in the PdDv object class.
Instructor Guide
attribute
This identifies the name of the attribute. This is the name that can be passed to the
mkdev or chdev command. For example, to change the default name of dumb to ibm3151
for tty0, you can issue the following command:
# chdev -l tty0 -a term=ibm3151
deflt
This identifies the default value for an attribute. Nondefault values are stored in CuAt.
values
This identifies the possible values that can be associated with the attribute name. For
example, allowed values for the block_size attribute range from 0 to 2147483648, with
an increment of 1.

V7.0.1
Instructor Guide

Purpose — Introduce ODM class PdAt.
Details — Describe the four major fields of PdAt that are shown on the visual.
Additional information — Describe the pvid attribute for disks. The default physical
volume ID for a disk is none. For each disk, a physical volume ID must be generated when
the disk is configured for the first time.
To list the default attributes of a customized device, the high-level command is:
# lsattr -D -l <logical device name>
To list the range of supported values for an attribute, the high-level command is:
# lsattr -R -l <logical device name> -a <attr_name>
Transition statement — The next ODM class is CuDv.
Instructor Guide
Customized devices
IBM Power Systems
CuDv:
name = "ent1"
status = 1
chgstatus = 2
ddins = "pci/goentdd"
location = "02-08"
parent = "pci2"
connwhere = "8"
PdDvLn = "adapter/pci/14106902"
CuDv:
name = "hdisk2"
status = 1
chgstatus = 2
ddins = "scdisk"
location = "01-08-01-8,0"
parent = "scsi1"
connwhere = "8,0"
PdDvLn = "disk/scsi/scsd"
Figure 2-19. Customized devices AN152.2
Notes:
The customized devices (CuDv) object class

The Customized Devices (CuDv) object class contains entries for all device instances
defined in the system. As the name implies, a defined device object is an object that a
define method has created in the CuDv object class. A defined device object may or
may not have a corresponding actual device attached to the system.
The CuDv object class contains objects that provide device and connection information
for each device. Each device is distinguished by a unique logical name. The customized
database is updated twice, during system bootup and at run time, to define new
devices, remove undefined devices, and update the information for a device that has
changed.
The key descriptors in CuDv are described in the next few paragraphs.

V7.0.1
Instructor Guide
Uempty name
A customized device object for a device instance is assigned a unique logical name to
distinguish the device from other devices. The visual shows two devices, an Ethernet
adapter ent1 and a disk drive hdisk2.
status
This identifies the current status of the device instance. Possible values are:
- status = 0 - Defined
- status = 1 - Available
- status = 2 - Stopped
chgstatus
This flag tells whether the device instance has been altered since the last system boot.
The diagnostics facility uses this flag to validate system configuration. The flag can take
these values:
- chgstatus = 0 - New device
- chgstatus = 1 - Doesn’t know
- chgstatus = 2 - Same
- chgstatus = 3 - Device is missing
ddins
This descriptor typically contains the same value as the Device Driver Name descriptor
in the Predefined Devices (PdDv) object class. It specifies the name of the device
driver that is loaded into the AIX kernel.
location
Identifies the AIX location of a device. The location code is a path from the system unit
through the adapter to the device. In case of a hardware problem, the location code is
used by technical support to identify a failing device.
parent
Identifies the logical name of the parent device. For example, the parent device of
hdisk2 is scsi1.
Instructor Guide
connwhere
Identifies the specific location on the parent device where the device is connected. For
example, the device hdisk2 uses the SCSI address 8,0.
PdDvLn
Provides a link to the device instance's predefined information through the uniquetype
descriptor in the PdDv object class.

V7.0.1
Instructor Guide

Purpose — Introduce ODM class CuDv.
Details — Do not explain all shown descriptors from the visual. Concentrate on explaining
the ones which are important (status and chgstatus).
Discuss the objects shown in bold on the visual. Point out that the student notes have a
legend of the code values. They are asked to explain the translation of this code in the lab
exercise. The value chgstatus=2 means that the state of hdisk2 has not changed since
last boot. The value chgstatus=1 would mean that the state of this device could not be
determined by the cfgmgr. (for example when dealing with a device that is attached using a
serial or parallel port).
Ask students if anybody has seen the following message during system boot: A
previously defined device could not be detected. Explain that this message is
caused by a device that is defined in CuDv but is not physically present. For this device,
the value of chgstatus is 3.
To list the devices in the Customized Devices object class, the high-level command is:
# lsdev -C
Transition statement — The next class is CuAt.
Instructor Guide
Customized attributes
IBM Power Systems
CuAt:
name = "ent1"
attribute = "jumbo_frames"
value = "yes"
...
CuAt:
name = "hdisk2"
attribute = "pvid"
value = "00c35ba0816eafe50000000000000000"
...
Figure 2-20. Customized attributes AN152.2
Notes:
The customized attribute (CuAt) object class

The Customized Attribute (CuAt) object class contains customized device-specific
attribute information.
Devices represented in the Customized Devices (CuDv) object class have attributes
found in the Predefined Attribute (PdAt) object class and the CuAt object class. There
is an entry in the CuAt object class for attributes that take customized values. Attributes
taking the default value are found in the PdAt object class. Each entry describes the
current value of the attribute.
Discussion of examples on visual

The sample CuAt entries on the visual show two attributes that have customized
values. The attribute login has been changed to enable. The attribute pvid shows the
physical volume identifier that has been assigned to disk hdisk0.

V7.0.1
Instructor Guide

Purpose — Introduce the CuAt ODM class.
Details — Explain that CuAt contains customized values. The default values are stored in
PdAt.
Additional information — Mention the 16 zeros that are part of the pvid value. They are
not shown with the lsdev command. The value of the pvid for disks is not set until the disk
becomes part of a volume group.
To list the effective attributes values for a customized device, the high-level command is:
# lsattr -E -l <logical device name>
To set an effective attribute value for a device, the high-level command is:
# chdev -l <logical device name> -a <attribute_name>=<value>
Transition statement — Let’s look at a few more ODM object classes.
Instructor Guide
Additional device object classes

IBM Power Systems
PdCn: CuDvDr:
uniquetype = "adapter/pci/sym875" resource = "devno"
connkey = "scsi" value1 = "36"
connwhere = "1,0" value2 = "0"
value3 = "hdisk3"
PdCn:
uniquetype = "adapter/pci/sym875" CuDvDr:
connkey = "scsi" resource = "devno"
connwhere = "2,0" value1 = "36"
value2 = "1"
value3 = "hdisk2"
CuDep: CuVPD:
name = "rootvg" name = "hdisk2"
dependency = "hd6" vpd_type = 0
vpd = "*MFIBM *TM\n\
CuDep: HUS151473VL3800 *F03N5280
name = "datavg" *RL53343341*SN009DAFDF*ECH17923D
dependency = "lv01" *P26K5531 *Z0\n\
000004029F00013A*ZVMPSS43A
*Z20068*Z307220"
Figure 2-21. Additional device object classes AN152.2
Notes:
PdCn
The Predefined Connection (PdCn) object class contains connection information for
adapters (or sometimes called intermediate devices). This object class also includes
predefined dependency information. For each connection location, there are one or
more objects describing the subclasses of devices that can be connected.
The sample PdCn objects on the visual indicate that, at the given locations, all devices
belonging to subclass SCSI could be attached.
CuDep
The Customized Dependency (CuDep) object class describes device instances that
depend on other device instances. This object class describes the dependence links
between logical devices and physical devices as well as dependence links between

V7.0.1
Instructor Guide
Uempty logical devices, exclusively. Physical dependencies of one device on another device are
recorded in the Customized Devices (CuDep) object class.
The sample CuDep objects on the visual show the dependencies between logical
volumes and the volume groups they belong to.
CuDvDr
The Customized Device Driver (CuDvDr) object class is used to create the entries in
the /dev directory. These special files are used from applications to access a device
driver that is part of the AIX kernel. The attribute value1 is called the major number and
is a unique key for a device driver. The attribute value2 specifies a certain operating
mode of a device driver.
The sample CuDvDr objects on the visual reflect the device driver for disk drives
hdisk2 and hdisk3. The major number 36 specifies the driver in the kernel. In our
example, the minor numbers 0 and 1 specify two different instances of disk dives, both
using the same device driver. For other devices, the minor number may represent
different modes in which the device can be used. For example, if we were looking at a
tape drive, the operating mode 0 would specify a rewind on close for the tape drive, the
operating mode 1 would specify no rewind on close for a tape drive.
CuVPD
The Customized Vital Product Data (CuVPD) object class contains vital product data
(manufacturer of device, engineering level, part number, and so forth) that is useful for
technical support. When an error occurs with a specific device, the vital product data is
shown in the error log.
Instructor Guide
Instructor notes:
Purpose — Explain briefly the function of some additional ODM classes.
Details — Describe the ODM classes shown using the explanations in the student notes.
Avoid going into too much detail; these are mostly used, under the covers, by the operating
system. Try to summarize the object classes in simple to understand terms. For example:
• PdCn - Identifies the family of possible connections. If a SCSI adapter only supported 8
possible SCSI addresses, there would be 8 PdCn objects, one for each possible
address. Remember that only a few of these address might actually be in use.
• CuDep - Identifies dependencies. In the example shown, the system would use this
dependency to prevent removal of the datavg volume group after until the logical
volume lv01 was removed.
• CuDvDr - Identifies device driver for each device using major and minor numbers. This
is the same information you see when you execute a long listing of the files in the /dev
directory.
• CuVPD - Vital product information. Basically manufacturer and product information.
Transition statement — Let us look at how these device object classes relate to the high
level commands that we will more often use to examine and change this information.

V7.0.1
Instructor Guide
Uempty
ODM and high-level device commands

IBM Power Systems
Listing objects in the Predefined and Customized classes:

List the PdDv object class:
# lsdev –P [-c <class>] [-s <subclass>] [-t <type>]
List the CuDv object class:
# lsdev –C [-l <device name>] [-c <class>] [-s <subclass>]
[-t <type>]
Listing default and effective attributes:

List default attributes from PdAt object class:
# lsattr –D –c <class> -s <subclass> -t <type> [-a
<attribute>]
# lsattr –D –l <device name> [-a <attribute>]
List an enumeration or range of acceptable attribute values:
# lsattr –R –l <device name> -a <attribute name>
List effective attributes (PdAt and overrides in CuAt):
# lsattr –E –l <device name> [-a <attribute>]
Figure 2-22. ODM and high-level device commands AN152.2
Notes:
Most of the time the information in the ODM device database is accessed and managed
using high-level commands. Understanding the object classes and their roles assists in
the using these commands.
The lsdev command has options which control which ODM object class you list.
To see the objects in the Predefined Device (PdDv) object class, use the -P flag. If you
want to control the output, you can optionally qualify the command with any
combination of the three key descriptors: class, subclass, and type.
To see objects in the Customized Device (CuDv) object class, use the -C flag. To
control the output, you can either specify a particular device (using its logical device
name) or you can use any combination of the PdDv object class key descriptors.
Here is an example of specifying a particular device:
# lsdev -l hdisk0
Instructor Guide
The most common PdDv descriptor qualification is the class. Thus, it is common to
enter commands such as:
# lsdev -Cc disk
# lsdev -Cc adapter
The lsattr command, also, has options which control which ODM object classes it
uses.
To see the default attribute values, which are stored in the Predefined Attributes (PdAt)
object class, use the -D flag. You must uniquely identify the object by either:
• Specifying the class, subclass, and type for the object
• Specifying the logical device name of a customized device which is related to the
PdAt object
The effective attributes are either the attributes in the Customized Attributes (CuAt)
object class for the specified device, or (if the there is no value specified in the CuAt)
the default attribute value from the related PdAt object. You must specify a particular
device by providing the logical device name of that device.
When using the chdev command to modify an attribute value, the command logic will
not allow you to enter what it considers unacceptable values. It knows what is allowed
by examining the value descriptor for the attribute in the PdAt object class. If you get an
exception message attempting to set an attribute value, it is useful to know what is
acceptable. This information is displayed by the lsattr command when using the -R
(range) flag. The -R option requires that the attribute name be identified in addition to
the logical name of the device for which you are attempting modify that attribute.

V7.0.1
Instructor Guide

Purpose — Relate the high level device commands to the device related ODM objects.
Details —
Transition statement — We have reached a checkpoint.
Instructor Guide
Checkpoint
IBM Power Systems
1. In which ODM class do you find the physical volume IDs of your
disks?
2. What is the difference between the states: defined and available?
Notes:

V7.0.1
Instructor Guide

Purpose —
Details —
IBM Power Systems
1. In which ODM class do you find the physical volume IDs of

your disks?
The answer is CuAt.
2. What is the difference between the states: defined and

available?
The answer is when a device is defined, there is an entry in
ODM class CuDv. When a device is available, the device
driver has been loaded. The device driver can be accessed
by the entries in the /dev directory.
Transition statement — Let’s look at reinforcing what we have covered by playing with the
ODM in the lab.
Instructor Guide
Exercise: The Object Data Manager

IBM Power Systems
Review the device configuration ODM

classes
Modify a device attributes default value
Create self-defined ODM classes

(Optional)
Figure 2-24. Exercise: The Object Data Manager AN152.2
Notes:

V7.0.1
Instructor Guide

Purpose — Introduce the exercise.
Details —
Transition statement —
Instructor Guide
Unit summary
IBM Power Systems

Describe the structure of the ODM
Use the ODM command line interface
Explain the role of the ODM in device configuration
Describe the function of the most important ODM files
Notes:
The ODM is made from object classes, which are broken into individual objects and
descriptors.
AIX offers a command line interface to work with the ODM files.
The device information is held in the customized and the predefined databases
(Cu*, Pd*).

V7.0.1
Instructor Guide

Purpose — Review some of the key points covered in the unit.
Details — Present the highlights from the unit. Ask if there are any questions about the
material. Provide time for the students to think and formulate any questions they may have.
Transition statement — Let’s continue with the next unit.
Instructor Guide

V7.0.1
Instructor Guide
Uempty Unit 3. Error monitoring
Estimated time
01:10

This unit covers techniques in monitoring for problems and how to
automate responses to those problems. Topics include an overview of
the AIX Error Log facility (and how it can interact with the syslogd
daemon), and the system hang (shdaemon) monitoring facility.

• Analyze error log entries
• Identify and maintain the error logging components
• Describe different error notification methods
• Log system messages using the syslogd daemon
• Monitor and take actions for hang conditions using shdaemon

Accountability:
• Lab exercise
References
Online AIX Version 7.1 General Programming Concepts:
Writing and Debugging Programs (Chapter 5.
Error-Logging Overview)
© Copyright IBM Corp. 2009, 2012 Unit 3. Error monitoring 3-1

Instructor Guide
Unit objectives
IBM Power Systems

Analyze error log entries
Identify and maintain the error logging components
Describe different error notification methods
Log system messages using the syslogd daemon
Monitor and take actions for hang conditions using shdaemon
Notes:

V7.0.1
Instructor Guide

Purpose — Introduce the topics to be covered in this unit.
Details — Use the student material to guide your presentation.
Transition statement — Let’s discuss error logging first.
For an ILO (Instructor Lead On-line) class: You should play file AN152U03F02 in the
multimedia library of Elluminate in place of the next visual. You can then continue your
lecture normally to reinforce the topics if desired.
a. To access the multimedia library click on the CD button along the toolbar in
Elluminate.
b. Once the multimedia library window is open, select AN152U03F02 and click on play.
c. Ask the students to indicate via green checkmark when they have finished the file.
For a FTF (Face to Face) class: You should play file AN152U03F02 on your instructor PC
in place of the next visual.
Note that you can also use this activity as a review for the information covered in the
visual as well.

Instructor Guide

V7.0.1
Instructor Guide
Uempty 3.1. Working with the error log

What students will do — Identify the components of the error logging facility and create
error reports.
How students will do it — Through lecture, lab exercise, and checkpoint questions.
What students will learn — How to create and read an error report, and when and how to
maintain the error log.
How this will help students on their job — Being able to identify possible software and
hardware errors and solutions will enhance the students' job performance and productivity.

Instructor Guide
Error logging components

IBM Power Systems
smit
diagnostics
e-mail
console formatted
errpt
output
error notify
method
ODM
errlog
errnotify /var/adm/ras/errlog
error
daemon
errclear
errstop /usr/lib/errdemon
errlogger
application
errlog() User
Kernel
/dev/error
errsave() (timestamp)
kernel module
Figure 3-2. Error logging components AN152.2
Notes:
Detection of an error
The error logging process begins when an operating system module detects an error.
The error detecting segment of code then sends error information to either the
errsave() kernel service or the errlog() application subroutine, where the information
is in turn written to the /dev/error special file. This process then adds a timestamp to
the collected data. The errdemon daemon constantly checks the /dev/error file for new
entries, and when new data is written, the daemon conducts a series of operations.
Creation of error log entries

The errdemon daemon collects additional data from other parts of the system before
writing the information to the error log. There is an Error Record Template
(/var/adm/ras/errtmplt) which identifies what information is needed. For example, if the
error signifies a hardware-related problem and hardware vital product data (VPD)
exists, the daemon retrieves the VPD from the ODM.

V7.0.1
Instructor Guide
Uempty When you access the error log with the errpt command (from the command line or by
way of a SMIT panel), the error log is formatted according to the error template in the
error record template and presented in either a summary or detailed report. Most
entries in the error log are attributable to hardware and software problems, but
informational messages can also be logged, for example, by the system administrator,
using the errlogger command.
The errlogger command

The errlogger command allows the system administrator to record messages of up to
1024 bytes in the error log. Whenever you perform a maintenance activity, such as
clearing entries from the error log, replacing hardware, or applying a software fix, it is a
good idea to record this activity in the system error log.
The following example illustrates use of the errlogger command:
# errlogger system hard disk ’(hdisk0)’ replaced.
This message will be listed as part of the error log.
The errclear command

The errclear command allows the you to selectively delete records from the log. The
criteria is the same as for selectively reporting entries with errpt.
The errnotify methods

Later this unit will present details on the option to define an errnotify method to be
executed anytime certain specified error records are processed by the errdemon. The
actions taken by the method program or script could include such actions as sending
e-mail, writing to the console, or triggering diagnostics.

Instructor Guide
Instructor notes:
Purpose — Define the components of the error logging facility.
Details — Cover the diagram on the visual starting from the bottom, with the error being
detected by errlog() or errsave() and an entry being made in /dev/error, up to the point
where a user can look at the records of the error log either by going through SMIT or by
executing the errpt command. An optional flow is shown in the upper left of the visual.
Briefly mention that a given error record could be defined to trigger an automatic action
(called a method). This error notification mechanism will be cover in more detail later in the
unit.
Additional information — .
The following is a list of terms that you may refer to:
error ID This is a 32-bit hexadecimal code used to identify a particular
failure. Each error record template has a unique error ID.
error label This is the mnemonic name for an error ID.
error log This is the file that stores instances of errors and failures
encountered by the system.
error log entry A record in the system error log that describes a failure.
Contains captured failure data.
error record template A description of what will be displayed when the error log is
formatted for a report, including information on the type and
class of error, probable causes and recommended actions.
Collectively, the templates comprise the Error Record Template
Repository.
The errpt command can be run from the shell or SMIT to format records in the errlog into
readable reports. The ODM classes CuDv, CuAt and CuVPD provides information for the
detailed error reporting.
Error log hardening

Under very rare circumstances, such as powering off the system exactly while the
errdemon is writing into the error log, the error log may become corrupted.
When the errdemon starts, it checks for error log consistency. First, it makes a backup
copy of the existing error log file to /tmp/errlog.save, and then it corrects the error log
file, while preserving consistent error log entries.
See the AIX 5L Differences Guide Version 5.3 Edition Redbook (SG24-7463-00) for
more information about error log hardening (also referred to as error log RAS)
Transition statement — SMIT can be used to generate an error report.

V7.0.1
Instructor Guide
Uempty
Generating an error report using SMIT

IBM Power Systems
# smit errpt
Generate an Error Report
...
CONCURRENT error reporting? no
Type of Report summary +
Error CLASSES (default is all) [] +
Error TYPES (default is all) [] +
Error LABELS (default is all) [] +
Error ID's (default is all) [] +
Resource CLASSES (default is all) []
Resource TYPES (default is all) []
Resource NAMES (default is all) []
SEQUENCE numbers (default is all) []
STARTING time interval []
ENDING time interval []
Show only Duplicated Errors [no]
Consolidate Duplicated Errors [no]
LOGFILE [/var/adm/ras/errlog]
TEMPLATE file [/var/adm/ras/errtmplt]
MESSAGE file []
FILENAME to send report to (default is stdout)[]
...
Figure 3-3. Generating an error report using SMIT AN152.2
Notes:
Overview
The SMIT fastpath smit errpt takes you to the screen used to generate an error
report. Any user can use this screen. As shown on the visual, the screen includes a
number of fields that can be used for report specifications. Some of these fields are
described in more detail below.
CONCURRENT error reporting?

Yes means you want errors displayed or printed as the errors are entered into the error
log (a sort of tail -f).

Instructor Guide
Type of report
Summary, intermediate, and detailed reports are available. Detailed reports give
comprehensive information. Intermediate reports display most of the error information.
Summary reports contain concise descriptions of errors.
Error classes
Values are H (hardware), S (software), and O (operator messages created with
errlogger). You can specify more than one error class.
Error types
Valid error types include the following:
- PEND - The loss of availability of a device or component is imminent.
- PERF - The performance of the device or component has degraded to below an
acceptable level.
- TEMP - Recovered from condition after several attempts.
- PERM - Unable to recover from error condition. Error types with this value are usually
the most severe errors and imply that you have a hardware or software defect. Error
types other than PERM usually do not indicate a defect, but they are recorded so that
they can be analyzed by the diagnostic programs.
- UNKN - Severity of the error cannot be determined.
- INFO - The error type is used to record informational entries
Error labels
An error label is the mnemonic name used for an error ID.
Error IDs
An error ID is a 32-bit hexadecimal code used to identify a particular failure.
Resource classes
Means device class for hardware errors (for example, disk).
Resource types
Indicates device type for hardware (for example, 355 MB).

V7.0.1
Instructor Guide
Uempty Resource names

Provides common device name (for example hdisk0).
Starting and ending time interval

The format mmddhhmmyy can be used to select only errors from the log that are time
stamped between the two values.
Show only duplicated errors

Yes will report only those errors that are exact duplicates of previous errors generated
during the interval of time specified. The default time interval is 100 milliseconds. This
value can be changed with the errdemon -t command. The default for the Show only
Duplicated Errors option is no.
Consolidate duplicated errors

Yes will report only the number of duplicate errors and timestamps of the first and last
occurrence of that error. The default for the Consolidate Duplicated Errors option is
no.
File name to send reports to

The report can be sent to a file. The default is to send the report to stdout.

Instructor Guide
Instructor notes:
Purpose — Explain how an error report can be generated through SMIT.
Details — Explain the options in generating an error report. The main option is the type of
report: summary versus detailed versus intermediate. Also explain the concurrent option.
Point out that the rest of the fields are for identifying criteria for selectively reporting. Ask
the student if they have previously worked with the AIX error log and (if they have) what
their experiences are regarding these options.
Additional information — This option will allow you to produce a detailed or summary
report. Examples of both will be given.
Mention all the different fields that can be used to generate specific searches and reports.
Note that the report can be sent to a file - which is defined by the last option.
The Show only Duplicated Errors option in the Generate an Error Report screen was
introduced in AIX 5L V5.1. Examples of duplicate errors might include floppy drive not
ready, external drive not ready, or Ethernet card unplugged.
Transition statement — Instead of using SMIT, you can also generate a report from the
command line. Let's see how this can be done.

V7.0.1
Instructor Guide
Uempty
The errpt command

IBM Power Systems
Summary report:
# errpt
Intermediate report:
# errpt -A
Detailed report:
# errpt -a
Summary report of all hardware errors:
# errpt -d H
Detailed report of all software errors:
# errpt -a -d S
Concurrent error logging ("Real-time" error logging):
# errpt -c > /dev/console
Figure 3-4. The errpt command AN152.2
Notes:
Types of reports available

The errpt command generates a report of logged errors. Three different layouts can be
produced, depending on the option that is used:
- A summary report gives an overview (default).
- An intermediate report only displays the values for the LABEL, Date/Time, Type,
Resource Name, Description and Detailed Data fields. Use the option -A to
specify an intermediate report.
- A detailed report shows a detailed description of all the error entries. Use the option
-a to specify a detailed report.

Instructor Guide
The -d option
The -d option (flag) can be used to limit the report to a particular class of errors. Two
examples illustrating use of this flag are shown on the visual:
- The command errpt -d H specifies a summary report of all hardware (-d H) errors.
- The command errpt -a -d S specifies a detailed report (-a) of all software (-d S)
errors.
Input file used

The errpt command queries the error log file /var/adm/ras/errlog to produce the error
report.
The -c option
If you want to display the error entries concurrently, that is, at the time they are logged,
you must execute errpt -c. In the example on the visual, we direct the output to the
system console.
The -D flag
Duplicate errors can be consolidated using errpt -D. When used with the -a option,
errpt -D reports only the number of duplicate errors and the timestamp for the first and
last occurrence of the identical error.
The -P flag
Shows only errors which are duplicates of the previous error. The -P flag applies only to
duplicate errors generated by the error log device driver.
Additional information
The errpt command has many options. Refer to your AIX Commands Reference (or
the man page for errpt) for a complete description.

V7.0.1
Instructor Guide

Purpose — Introduce the errpt command.
Details — Describe using the information in the student notes
Transition statement — Now that we know how we can formulate a report, let’s look at
examples of summary and detailed reports. Let’s start with the summary report.

Instructor Guide
A summary report: errpt

IBM Power Systems
# errpt
IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION
192AC071 1010130907 T O errdemon ERROR LOGGING TURNED OFF

C6ACA566 1010130807 U S syslog MESSAGE REDIRECTED FROM SYSLOG
A6DF45AA 1010130707 I O RMCdaemon The daemon is started.
2BFA76F6 1010130707 T S SYSPROC SYSTEM SHUTDOWN BY USER
9DBCFDEE 1010130707 T O errdemon ERROR LOGGING TURNED ON
192AC071 1010123907 T O errdemon ERROR LOGGING TURNED OFF
AA8AB241 1010120407 T O OPERATOR OPERATOR NOTIFICATION
2BFA76F6 1010094907 T S SYSPROC SYSTEM SHUTDOWN BY USER
EAA3D429 1010094207 U S LVDD PHYSICAL PARTITION MARKED STALE
EAA3D429 1010094207 U S LVDD PHYSICAL PARTITION MARKED STALE
F7DDA124 1010094207 U H LVDD PHYSICAL VOLUME DECLARED MISSING
Error Type: Error Class:

P:Permanent, H: Hardware
Performance, or Pending S: Software
T: Temporary O: Operator
I: Informational U: Undetermined
U: Unknown
Figure 3-5. A summary report: errpt AN152.2
Notes:
Content of summary report

By default, the errpt command creates a summary report which gives an overview of
the different error entries. One line per error is fine to get a feel for what is there, but you
need more details to understand problems.
Need for detailed report

The example shows different hardware and software errors that occurred. To get more
information about these errors, you must create a detailed report.

V7.0.1
Instructor Guide

Purpose — Discuss the summary error report.
Details — Use the information in the student notes and the information given under
“Additional Information” below to guide your explanation.
Additional information — The first field indicates the error ID, which is not unique to each
entry, that is, to each instance of an error. It is unique for a kind of error.
The next field is the time field which is in the following format: mmddhhmmyy where
mmddhhmmyy is the month, day, hour, minute, and year (as previously discussed).
The third field specifies the type of error; possible values are defined at the bottom of the
visual. There is a problem with this field because there are three possible values that begin
with the letter P. As this field is a one-letter field, you cannot tell exactly what type of an
error you are dealing with until you view the detailed report.
The next field defines the class; again the possible values are given at the bottom of the
visual.
The last two fields give the resource name of the component that is causing the problem
and also a description of the error.
Transition statement — Let’s look at a detailed report.

Instructor Guide
A detailed error report: errpt -a

IBM Power Systems
LABEL: LVM_SA_PVMISS
IDENTIFIER: F7DDA124
Date/Time: Wed Oct 10 09:42:20 CDT 2007

Sequence Number: 113
Machine Id: 00C35BA04C00
Node Id: rt1s3vlp2
Class: H
Type: UNKN
WPAR: Global
Resource Name: LVDD
Resource Class: NONE
Resource Type: NONE
Location:
Description
PHYSICAL VOLUME DECLARED MISSING
Probable Causes
POWER, DRIVE, ADAPTER, OR CABLE FAILURE
Detail Data
MAJOR/MINOR DEVICE NUMBER
8000 0011 0000 0001
SENSE DATA
00C3 5BA0 0000 4C00 0000 0115 7F54 BF78 00C3 5BA0 7FCF 6B93 0000 0000 0000 0000
Figure 3-6. A detailed error report: errpt -a AN152.2
Notes:
Content of detailed error report

As previously mentioned, detailed error reports are generated by issuing the errpt -a
command. The first half of the information displayed is obtained from the ODM (CuDv,
CuAt, CuVPD) and is very useful because it shows clearly which part causes the error
entry. The next few fields explain probable reasons for the problem, and actions that
you can take to correct the problem.
The last field, SENSE DATA, is a detailed report about which part of the device is failing.
For example, with disks, it could tell you which sector on the disk is failing. This
information can be used by IBM support to analyze the problem.

V7.0.1
Instructor Guide
Uempty Interpreting error classes and types

The values shown for error class and error type provide information that is useful in
understanding a particular problem:
1. The combination of an error class value of H and an error type value of PERM
indicates that the system encountered a problem with a piece of hardware and could
not recover from it.
2. The combination of an error class value of H and an error type value of PEND
indicates that a piece of hardware may become unavailable soon due to the
numerous errors detected by the system.
3. The combination of an error class value of S and an error type of PERM indicates that
the system encountered a problem with software and could not recover from it.
4. The combination of an error class value of S and an error type of TEMP indicates that
the system encountered a problem with software. After several attempts, the system
was able to recover from the problem.
5. An error class value of O indicates that an informational message has been logged.
6. An error class value of U indicates that an error class could not be determined.
Link between error log and diagnostics

In AIX 5L V5.1 and later, there is a link between the error log and diagnostics. Error
reports include the diagnostic analysis for errors that have been analyzed. Diagnostics,
and the diagnostic tool diag, will be covered in a later unit.

Instructor Guide
Example of Detailed Error Report

LABEL: LVM_SA_PVMISS
IDENTIFIER: F7DDA124
Date/Time: Wed Oct 10 09:42:20 CDT 2007

Sequence Number: 113
Machine Id: 00C35BA04C00
Node Id: rt1s3vlp2
Class: H
Type: UNKN
WPAR: Global
Resource Name: LVDD
Resource Class: NONE
Resource Type: NONE
Location:
Description
PHYSICAL VOLUME DECLARED MISSING
Probable Causes
POWER, DRIVE, ADAPTER, OR CABLE FAILURE
Detail Data
MAJOR/MINOR DEVICE NUMBER
8000 0011 0000 0001
SENSE DATA
00C3 5BA0 0000 4C00 0000 0115 7F54 BF78 00C3 5BA0 7FCF 6B93 0000 0000
0000 0000

V7.0.1
Instructor Guide

Purpose — Explain the information that is obtained from a detailed report.
Details — Explain using the information in the student notes.
Transition statement — Disk errors are frequently seen in the error log. There are many
different types of disk errors. Let’s identify the different types and find out the severity of
each.

Instructor Guide
Types of disk errors

IBM Power Systems
Error
Error Label Recommendations
Type
DISK_ERR1 P Failure of physical volume media
Action: Replace device as soon as possible
DISK_ERR2, P Device does not respond
DISK_ERR3 Action: Check power supply
DISK_ERR4 T Error caused by bad block or occurrence of a
recovered error
Rule of thumb: If disk produces more than one
DISK_ERR4 per week, replace the disk
SCSI_ERR* P SCSI communication problem
(SCSI_ERR10) Action: Check cable, SCSI addresses,
terminator
Error types: P = Permanent
T = Temporary
Figure 3-7. Types of disk errors AN152.2
Notes:
Common disk errors

The following list explains the most common disk errors you should know about:
- DISK_ERR1 is caused from wear and tear of the disk. Remove the disk as soon as
possible from the system and replace it with a new one. Follow the procedures that
you have learned earlier in this course.
- DISK_ERR2 and DISK_ERR3 error entries are mostly caused by a loss of electrical
power.
- DISK_ERR4 is the most interesting one, and the one that you should watch out for, as
this indicates bad blocks on the disk. Do not panic if you get a few entries in the log
of this type of an error. What you should be aware of is the number of DISK_ERR4
errors and their frequency. The more you get, the closer you are getting to a disk
failure. You want to prevent this before it happens, so monitor the error log closely.

V7.0.1
Instructor Guide
Uempty - Sometimes SCSI errors are logged, mostly with the LABEL SCSI_ERR10. They
indicate that the SCSI controller is not able to communicate with an attached device.
In this case, check the cable (and the cable length), the SCSI addresses, and the
terminator.
DISK_ERR5 errors
A very infrequent error is DISK_ERR5. It is the catch-all (that is, the problem does not
match any of the above DISK_ERRx symptoms). You need to investigate further by
running the diagnostic programs which can detect and produce more information about
the problem.

Instructor Guide
Instructor notes:
Purpose — Define the different types of disk errors.
Additional information — Explain each type of error in turn:
Disk errors 1,2, and 4 will return sense data which can be analyzed by the diagnostic
programs to provide extra information regarding the nature of the error, and its severity.
DISK_ERR 4 is by far the most common error generated, and it is the least severe. It
indicates that a bad block has been detected during a read or write request to the disk
drive.
Bad block relocation and mirroring
When a disk drive is formatted for the first time, a portion of the drive (about 5% in the case
of IBM drives) is set aside for bad block relocation. The format itself also masks and
readdresses existing bad blocks so that the medium is clean and ready for use. During use,
however, any disk drive can develop bad blocks that can be attributed to deterioration
caused by the setting and resetting of magnetic charges on the medium.
Bad blocks may be discovered during any read or write operation, triggering disk error 4s,
but they can only be actually relocated during a write operation.
At the software level, if your hardware does not support bad block relocation, you can set
logical volume bad block relocation. If a bad block is detected during a read or write
operation, its physical location is recorded in the logical volume device driver (LVDD)
defects directory. This directory is reviewed during each read or write request. Most
hardware does support bad block relocation and so the logical volume attribute is
irrelevant.
Bad blocks are never a problem when mirrored logical volumes are used. Either a read or
write request is completed on the mirror copy that is undamaged, and the damaged block is
always relocated. When a read requests a damaged block, the logical volume manager
converts the request to a write request and relocates the block with values derived from the
good copy. All this occurs without intervention or special configuration.
Transition statement — Let’s show the most important error entries the logical volume
manager creates.

V7.0.1
Instructor Guide
Uempty
LVM error log entries

IBM Power Systems
Class
Error Label and Recommendations
Type
LVM_BBEPOOL, S,P No more bad block relocation
LVM_BBERELMAX, Action: Replace disk as soon as
LVM_HWFAIL possible
LVM_SA_STALEPP S,P Stale physical partition
Action: Check disk, synchronize data
(syncvg)
LVM_SA_QUORCLOSE H,P Quorum lost, volume group closing
Action: Check disk, consider working
without quorum
Error Classes: H = Hardware Error Types: P = Permanent
S = Software T = Temporary
Figure 3-8. LVM error log entries AN152.2
Notes:
Important LVM error codes

The visual shows some very important LVM error codes you should know. All of these
errors are permanent errors that cannot be recovered. Very often these errors are
accompanied by hardware errors such as those shown on the previous page.
Immediate response to errors

Errors, such as those shown on the visual, require your immediate intervention.
Categories of LVM error labels:

LVM_BBEPOOL, LVM_BBERELMAX, LVM_HWFAIL:
No more bad block relocation
Action: Replace disk as soon as possible.

Instructor Guide
LVM_SA_STALEPP
Stale physical partition
Action: Check disk, synchronize data (syncvg).
LVM_SA_QUORCLOSE
Quorum lost, volume group closing
Action: Check disk, consider working without quorum.

V7.0.1
Instructor Guide

Purpose — Introduce some important LVM errors.
Details — Review the different terms and the errors that are produced by LVM.
Transition statement — Let’s see how to maintain the error log.

Instructor Guide
Maintaining the error log

IBM Power Systems
# smit errdemon
Change / Show Characteristics of the Error Log
Type or select values in entry fields.

Press Enter AFTER making all desired changes.
*Maximum LOGSIZE [1048576] #
Memory Buffer Size [32768] #
...
# smit errclear
Clean the Error Log

Remove entries older than this number of days [30] #

Error CLASSES [ ] +
Error TYPES [ ] +
...
Resource CLASSES [ ] +
...
==> Use the errlogger command as a reminder <==
Figure 3-9. Maintaining the error log AN152.2
Notes:
Changing error log attributes

To change error log attributes like the error log filename, the internal memory buffer
size, and the error log file size, use the SMIT fastpath smit errdemon. The error log file
is implemented as a ring. When the file reaches its limit, the oldest entry is removed to
allow adding a new one. The command that SMIT executes is the errdemon command.
See your AIX Commands Reference for a listing of the different options.
Cleaning up error log entries

To clean up error log entries, use the SMIT fastpath smit errclear. For example, after
removing a bad disk that caused error logs entries, you should remove the
corresponding error log entries regarding the bad disk. The errclear command is part
of the fileset bos.sysmgt.serv_aid.

V7.0.1
Instructor Guide
Uempty Entries in /var/spool/cron/crontabs/root use errclear to remove software and

hardware errors. Software and operator errors are purged after 30 days, hardware
errors are purged after 90 days.
Using errlogger to create reminders

Follow the suggestion at the bottom of the visual. Whenever an important system event
takes place, for example, the replacement of a disk, log this event using the errlogger
command.
Full list of characteristics of the error log

The listing shown in the visual is not the complete smit dialogue screen. Following is the
complete dialog fields:
* Maximum LOGSIZE [1048576] #
Memory BUFFER SIZE [32768] #
Duplicate Error Detection [true] +
Duplicate Time Interval [10000] #
in milliseconds
Duplicate error maximum [1000] #

Instructor Guide
Instructor notes:
Purpose — Introduce the errdemon and errclear commands.
Additional information — The Change / Show Characteristics of the Error Log
screen also contains duplicate error options. If Duplicate Error Detection is set to true,
Duplicate Time Interval in milliseconds is used to set a threshold during which
identical error log entries are removed. The Duplicate error maximum sets the point at
which an additional identical error will be considered a new error. For more information, see
the AIX Commands Reference entry for errdemon.
Transition statement — Let’s switch over to an exercise. This exercise has three parts,
but you should only do the first part now. There will be time to do the other parts of the
exercise later.

V7.0.1
Instructor Guide
Uempty
Exercise: Error monitoring (part 1)

IBM Power Systems
Part 1: Work with the error log
Figure 3-10. Exercise: Error monitoring (part 1) AN152.2
Notes:
Goals for this part of the exercise

The first part of this exercise allows you to work with the AIX error logging facility.
After completing this part of the exercise, you should be able to:
- Determine what errors are logged on your machine
- Generate different error reports
- Start concurrent error notification

Instructor Guide
Instructor notes:
Purpose — Introduce the next exercise.
Details — Be sure to mention that students should only do “Part 1" of the exercise at this
time. They will do the rest of the exercise later. Provide the goals of this part of the exercise
as given in the student notes.
Transition statement — Let’s switch over to the next topic, “Error Notification and
syslogd.” We will start by discussing the different ways that error notification can be
implemented.

V7.0.1
Instructor Guide
Uempty 3.2. Error notification and syslogd .

What students will do — Learn different ways to implement error notification. Learn how
to create and maintain the /etc/syslog.conf file and how to start and stop the syslogd
daemon.
How students will do it — Lecture, lab exercise, and checkpoint questions.
What students will learn — The students will be able to describe different ways to
implement error notification. Additionally, they will be able to interpret and create entries in
the /etc/syslog.conf file.
How this will help students on their job — By using error notification and the syslogd
daemon, students will be able to capture errors and correct system problems faster.

Instructor Guide
Error notification methods

IBM Power Systems
ODM-Based:
/etc/objrepos/errnotify
Error notification
Concurrent error logging:

Self-made error
errpt -c > /dev/console notification
Figure 3-11. Error notification methods AN152.2
Notes:
What is error notification?

Implementing error notification means taking steps that cause the system to inform you
whenever an error is posted to the error log.
Ways to implement error notification

There are different ways to implement error notification:
1. Concurrent error logging: This is the easiest way to implement error notification. If
you execute errpt -c, each error is reported when it occurs. By redirecting the
output to the console, an operator is informed about each new error entry.
2. Self-made error notification: Another easy way to implement error notification is to
write a shell procedure that regularly checks the error log. This is illustrated on the
next visual.

V7.0.1
Instructor Guide
Uempty 3. ODM-based error notification: The errdemon program uses the ODM class errnotify
for error notification. How to work with errnotify is discussed later in this topic.

Instructor Guide
Instructor notes:
Purpose — Provide different ways to implement error notification.
Details — Explain using the information in the notes. The two methods shown are covered
in the visuals that follows, so there is no need to “pre-teach” them in detail now.
Additional information — Earlier versions of the course discussed concurrent error
logging (errpt -c). Periodic Diagnostics using diagela are not used on p5 and p6
platforms. The two methods shown are covered in the visuals that follow, so there is no
need to “pre-teach” them now. By default, periodic diagnostics sends mail notifications. It
can be customized to take other actions, such as interfacing to other applications. To
specify a customized action, one would create a PDiagAtt ODM class object with a value
descriptor set to the full path to a script. To see more details about this, refer to the
document AIX 5L Version 5.3 Understanding the Diagnostic Subsystem for AIX
(SC23-4919)
Periodic Diagnostics: The diagnostics package (diag command) contains a periodic
diagnostic procedure (diagela). Whenever a hardware error is posted to the log, all
members of the system group get a mail message. Additionally, a message is sent to the
system console. The diagela program has disadvantages:
• Since it executes many times a day, the program might slow down your system.
• Only hardware errors are analyzed.
• Since AIX 5.2, diagela has only supported analyzing processor errors and no other
hardware.
• In POWER5 and POWER6 hardware, diagela does not even support processor
diagnostics. Instead, the platform firmware (service processor) handles this and reports
hardware errors to the managing HMC.
Transition statement — Let’s provide an example to show how you might implement
self-made error notification.

V7.0.1
Instructor Guide
Uempty
Self-made error notification

IBM Power Systems
#!/usr/bin/ksh
errpt > /tmp/errlog.1
while true
do
sleep 60 # Let's sleep one minute
# Compare the two files.

# If no difference, let's sleep again
cmp -s /tmp/errlog.1 /tmp/errlog.2 && continue
# Files are different: Let's inform the operator:

print "Operator: Check error log " > /dev/console
done
Figure 3-12. Self-made error notification AN152.2
Notes:
Implementing self-made error notification

It is very easy to implement self-made error notification by using the errpt command.
The sample shell script on the visual shows how this can be done.
Discussion of example on visual

The procedure on the visual shows a very easy but effective way of implementing error
notification. Let's analyze this procedure:
- The first errpt command generates a file /tmp/errlog.1.
- The construct while true implements an infinite loop that never terminates.
- In the loop, the first action is to sleep one minute.
- The second errpt command generates a second file /tmp/errlog.2.

Instructor Guide
- The two files are compared using the command cmp -s (silent compare, that means
no output will be reported). If the files are not different, we jump back to the
beginning of the loop (continue), and the process will sleep again.
- If there is a difference, a new error entry has been posted to the error log. In this
case, we inform the operator that a new entry is in the error log. Instead of print
you could use the mail command to inform another person.

V7.0.1
Instructor Guide

Purpose — Provide one way to implement self-made error notification.
Details — Explain using the material in the student notes.
Transition statement — Let’s look at how the errnotify ODM class can be used.

Instructor Guide
ODM-based error notification: errnotify

IBM Power Systems
errnotify:
en_pid = 0
en_name = "sample"
en_persistenceflg = 1
en_label = ""
en_crcid = 0
en_class = "H"
en_type = "PERM"
en_alertflg = ""
en_resource = ""
en_rtype = ""
en_rclass = "disk"
en_method = "errpt -a -l $1 | mail -s DiskError root"
Figure 3-13. ODM-based error notification: errnotify AN152.2
Notes:
The error notification object class

The Error Notification object class specifies the conditions and actions to be taken when
errors are recorded in the system error log. The user specifies these conditions and
actions in an Error Notification object.
Each time an error is logged, the error notification daemon determines if the error log
entry matches the selection criteria of any of the Error Notification objects. If matches
exist, the daemon runs the programmed action, also called a notify method, for each
matched object.
The Error Notification object class is located in the /etc/objrepos/errnotify file. Error
Notification objects are added to the object class by using ODM commands.

V7.0.1
Instructor Guide
Uempty Example on visual

The example on the visual shows an object that creates a mail message to root
whenever a disk error is posted to the log.
List of descriptors
Here is a list of all descriptors for the errnotify object class:
en_alertflg Identifies whether the error is alertable. This descriptor is
provided for use by alert agents with network management
applications. The values are TRUE (alertable) or FALSE (not
alertable).
en_class Identifies the class of error log entries to match. Valid values are
H (hardware errors), S (software errors), O (operator messages),
and U (undetermined).
en_crcid Specifies the error identifier associated with a particular error.
en_label Specifies the label associated with a particular error identifier as
defined in the output of errpt -t (show templates).
en_method Specifies a user-programmable action, such as a shell script or a
command string, to be run when an error matching the selection
criteria of this Error Notification object is logged. The error
notification daemon uses the sh -c command to execute the
notify method.
The following keywords are passed to the method as arguments:
$1 Sequence number from the error log entry
$2 Error ID from the error log entry
$3 Class from the error log entry
$4 Type from the error log entry
$5 Alert flags from the error log entry
$6 Resource name from the error log entry
$7 Resource type from the error log entry
$8 Resource class from the error log entry
$9 Error label from the error log entry
en_name Uniquely identifies the object
en_persistenceflg Designates whether the Error Notification object should be
removed when the system is restarted. 0 means removed at boot
time; 1 means persists through boot.

Instructor Guide
en_pid Specifies a process ID for use in identifying the Error Notification

object. Objects that have a PID specified should have the
en_persistenceflg descriptor set to 0.
en_rclass Identifies the class of the failing resource. For hardware errors,
the resource class is the device class (see PdDv). Not used for
software errors.
en_resource Identifies the name of the failing resource. For hardware errors,
the resource name is the device name. Not used for software
errors.
en_rtype Identifies the type of the failing resource. For hardware errors,
the resource type is the device type (see PdDv). Not used for
software errors.
en_symptom Enables notification of an error accompanied by a symptom
string when set to TRUE.
en_type Identifies the severity of error log entries to match. Valid values
are:
INFO: Informational
PEND: Impending loss of availability
PERM: Permanent
PERF: Unacceptable performance degradation
TEMP: Temporary
UNKN: Unknown
TRUE: Matches alertable errors
FALSE: Matches non-alertable errors
0: Removes the Error Notification object at system restart
non-zero: Retains the Error Notification object at system restart
en_err64 Identifies the environment of the error. TRUE indicates that the
error is from a 64-bit environment.
en_dup Identifies whether the kernel identified the error as a duplicate.
TRUE indicates that it is a duplicate error.

V7.0.1
Instructor Guide

Purpose — Describe the errnotify ODM class.
Details — Explain the example shown, not all possible descriptors.
Additional information — Use odmadd and odmdelete to add and delete objects to and
from errnotify.
Transition statement — Let’s move on to a description of syslogd.

Instructor Guide
syslogd daemon
IBM Power Systems
/etc/syslog.conf:
daemon.debug /tmp/syslog.debug
/tmp/syslog.debug:
syslogd inetd[16634]: A connection requires tn service
inetd[16634]: Child process 17212 has ended
# stopsrc -s inetd
Provide debug
# startsrc -s inetd -a "-d"
information
Figure 3-14. syslogd daemon AN152.2
Notes:
Function of syslogd
The syslogd daemon logs system messages from different software components
(kernel, daemon processes, system applications).
The /etc/syslog.conf configuration file

When started, the syslogd reads a configuration file /etc/syslog.conf. Whenever you
change this configuration file, you need to refresh the syslogd subsystem:
# refresh -s syslogd

V7.0.1
Instructor Guide
Uempty Discussion of example on visual

The visual shows a configuration that is often used when a daemon process causes a
problem. The following line is placed in /etc/syslog.conf and indicates that facility
daemon should be monitored/controlled:
daemon.debug /tmp/syslog.debug
The line shown also specifies that all messages with the priority level debug and higher,
should be written to the file /tmp/syslog.debug. Note that this file must exist.
The daemon process that causes problems (in our example the inetd) is started with
option -d to provide debug information. This debug information is collected by the
syslogd daemon, which writes the information to the log file /tmp/syslog.debug.

Instructor Guide
Instructor notes:
Purpose — Describe how the syslogd daemon works.
Transition statement — Let’s provide some other syslogd configuration examples.

V7.0.1
Instructor Guide
Uempty
syslogd configuration examples

IBM Power Systems
/etc/syslog.conf:
All security messages to the
auth.debug /dev/console system console
Collect all mail messages in

mail.debug /tmp/mail.debug /tmp/mail.debug
Collect all daemon messages

daemon.debug /tmp/daemon.debug in /tmp/daemon.debug
Send all messages, except

*.debug; mail.none @server mail messages, to host server
After changing /etc/syslog.conf:

Figure 3-15. syslogd configuration examples AN152.2
Notes:
Discussion of examples on visual

The visual shows some examples of syslogd configuration entries that might be placed
in /etc/syslog.conf:
- The following line specifies that all security messages are to be directed to the
system console:
auth.debug /dev/console
- The following line specifies that all mail messages are to be collected in the file
/tmp/mail.debug:
mail.debug /dev/mail.debug
- The following line specifies that all messages produced from daemon processes are
to be collected in the file /tmp/daemon.debug:
daemon.debug /tmp/daemon.debug

Instructor Guide
- The following line specifies that all messages, except messages from the mail
subsystem, are to be sent to the syslogd daemon on the host server:
*.debug; mail.none @server
Note that, if this example and the preceding example appear in the same
/etc/syslog.conf file, messages sent to /tmp/daemon.debug will also be sent to
the host server.
General format of /etc/syslog.conf entries

As you see, the general format for entries in /etc/syslog.conf is:
selector action
The selector field names a facility and a priority level. Separate facility
names with a comma (,). Separate the facility and priority level portions of the
selector field with a period (.). Separate multiple entries in the same selector field
with a semicolon (;). To select all facilities use an asterisk (*).
The action field identifies a destination (file, host or user) to receive the messages. If
routed to a remote host, the remote system will handle the message as indicated in its
own configuration file. To display messages on a user's terminal, the destination field
must contain the name of a valid, logged-in system user. If you specify an asterisk (*) in
the action field, a message is sent to all logged-in users.
Facilities
Use the following system facility names in the selector field:
kern Kernel
user User level
mail Mail subsystem
daemon System daemons
auth Security or authorization
syslog syslogd messages
lpr Line-printer subsystem
news News subsystem
uucp uucp subsystem
* All facilities
Priority levels
Use the following levels in the selector field. Messages of the specified level and all
levels above it are sent as directed.

V7.0.1
Instructor Guide
Uempty emerg Specifies emergency messages. These messages are not distributed to all
users.
alert Specifies important messages such as serious hardware errors. These
messages are distributed to all users.
crit Specifies critical messages, not classified as errors, such as improper login
attempts. These messages are sent to the system console.
err Specifies messages that represent error conditions.
warning Specifies messages for abnormal, but recoverable conditions.
notice Specifies important informational messages.
info Specifies information messages that are useful in analyzing the system.
debug Specifies debugging messages. If you are interested in all messages of a
certain facility, use this level.
none Excludes the selected facility.
Refreshing the syslogd subsystem

As previously mentioned, after changing /etc/syslog.conf, you must refresh the
syslogd subsystem in order to have the change take effect. Use the following
command to accomplish this:

Instructor Guide
Instructor notes:
Purpose — Provide some syslogd configuration examples.
Additional information — Do not explain all facilities and levels. Just explain the
examples.
Transition statement — Let’s explain how to redirect syslogd messages to the error log.

V7.0.1
Instructor Guide
Uempty
Redirecting syslog messages to error log

IBM Power Systems
/etc/syslog.conf:
*.debug errlog Redirect all syslog

messages to error log
# errpt
IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION

...
...
Figure 3-16. Redirecting syslog messages to error log AN152.2
Notes:
Consolidating error messages

Some applications use syslogd for logging errors and events. Some administrators find
it desirable to list all errors in one report.
Redirecting messages from syslogd to the error log

The visual shows how to redirect messages from syslogd to the error log.
By setting the action field to errlog, all messages are redirected to the AIX error log.

Instructor Guide
Instructor notes:
Purpose — Explain how to redirect syslog messages to the AIX error log.
Transition statement — What about the other way round?

V7.0.1
Instructor Guide
Uempty
Directing error log messages to syslogd

IBM Power Systems
errnotify:
en_name = "syslog1"
en_persistenceflg = l
en_method = "logger Error Log: `errpt -l $1 | grep -v TIMESTAMP`"
errnotify:
en_name = "syslog1"
en_method = "logger Error Log: $(errpt -l $1 | grep -v TIMESTAMP)"
Direct the last error entry (-l $1) to the syslogd

Do not show the error log header (grep -v) or (tail -1)
errnotify:
en_name = "syslog1"
en_method = "errpt -l $1 | tail -1 | logger -t errpt -p
daemon.notice"
Figure 3-17. Directing error log messages to syslogd AN152.2
Notes:
Using the logger command

You can direct error log events to syslogd by using the logger command with the
errnotify ODM class. Using objects such as those shown on the visual, whenever an
entry is posted to the error log, this last entry can be passed to the logger command.
Command substitution
You will need to use command substitution (or pipes) before calling the logger
command. The first two examples on the visual illustrate the two ways to do command
substitution in a Korn shell environment:
- Using the ‘UNIX command‘ syntax (with backquotes) - shown in the first example on
the visual
- Using the newer $(UNIX command) syntax - shown in the second example on the
visual

Instructor Guide
Instructor notes:
Purpose — Provide information on how to direct error log entries to the syslogd.
Point out that the visual just shows three ways to accomplish the same thing. The first two
examples use two different formats to invoke command substitution, which will place the
report text on the line before execution of the logger command. The last example feeds
the report text though a pipe to the logger command.
Transition statement — Describe the basic features of system hang detection.

V7.0.1
Instructor Guide
Uempty
System hang detection

IBM Power Systems
System hangs:
High priority process
Other
What does shdaemon do?
Monitors the system's ability to run processes
Takes specified action if threshold is crossed
Actions:
Logs error in the error log
Displays a warning message on the console
Launches recovery login on a console
Launches a command
Automatically reboots the system
Figure 3-18. System hang detection AN152.2
Notes:
Types of system hangs

shdaemon can help recover from certain types of system hangs. For our purposes, we
will divide system hangs into two types:
- High priority process
The system may appear to be hung if some applications have adjusted their process
or thread priorities so high that regular processes are not scheduled. In this case,
work is still being done, but only by the high priority processes. As currently
implemented, shdaemon specifically addresses this type of hang.
- Other
Other types of hangs may be caused by a variety of problems. For example, system
thrashing, kernel deadlock, and the kernel in tight loop. In these cases, no (or very
little) meaningful work will get done. shdaemon may help with some of these
problems.

Instructor Guide
What does shdaemon do?

If enabled, shdaemon monitors the system to see if any process with a process priority
number, higher than a set threshold, has been run during a set time-out period.
Remember that a higher process priority number indicates a lower priority on the
system. In effect, shdaemon monitors to see if lower priority processes are being
scheduled.
shdaemon runs at the highest priority (priority number = 0), so that it will always be able
to get CPU time, even if a process is running at very high priority.
Actions
If lower priority processes are not being scheduled, shdaemon will perform the specified
action. Each action can be individually enabled and has its own configurable priority
and time-out values. There are five actions available:
- Log error in the error log
- Display a warning message on a console
- Launch a recovery login on a console
- Launch a command
- Automatically REBOOT the system

V7.0.1
Instructor Guide

Purpose — Describe the basic features of system hang detection.
Details — If configured, shdaemon runs at priority 0 and monitors for processes run above
the threshold priority number (that is lower priority processes) during the last timeout
period. If no processes have been run above the threshold, shdaemon will take the
configured action. Actions can include making an entry in the error log file, starting a high
priority shell on a tty, alerting a system administrator, killing processes, or even rebooting
the system.
shdaemon is specifically targeted for situations where processes are running at such high
priority that normal shells cannot get any CPU time. shdaemon may help with other types of
hangs. If the system is hung in the kernel, shdaemon may be able to help if the kernel is still
responding to clock interrupts to run the scheduler.
How shdaemon actions have been configured will make a big difference in how effective
shdaemon will be to resolve a hang. There are many variables involved. It is the system
administrator’s responsibility to develop a set of actions that appropriately address the
issues for a particular system.
Transition statement — Let’s take a look at how shdaemon is configured.

Instructor Guide
Configuring shdaemon
IBM Power Systems
# shconf -E -l prio
sh_pp disable Enable Process Priority Problem
pp_errlog disable Log Error in the Error Logging

pp_eto 2 Detection Time-out
pp_eprio 60 Process Priority
pp_warning enable Display a warning message on a console

pp_wto 2 Detection Time-out
pp_wprio 60 Process Priority
pp_wterm /dev/console Terminal Device
pp_login enable Launch a recovering login on a console

pp_lto 2 Detection Time-out
pp_lprio 100 Process Priority
pp_lterm /dev/console Terminal Device
pp_cmd disable Launch a command

pp_cto 2 Detection Time-out
pp_cprio 60 Process Priority
pp_cpath /home/unhang Script
pp_reboot disable Automatically REBOOT system

pp_rto 5 Detection Time-out
pp_rprio 39 Process Priority
Figure 3-19. Configuring shdaemon AN152.2
Notes:
Introduction
shdaemon configuration information is stored as attributes in the SWservAt ODM object
class. Configuration changes take effect immediately and survive across reboots.
Use shconf (or smit shd) to configure or display the current configuration of shdaemon.
The values shown in the visual are the default values.
Enabling shdaemon
At least two parameters must be modified to enable shdaemon:
- Enable priority monitoring (sh_pp)
- Enable one or more actions (pp_errlog, pp_warning, and so forth)

V7.0.1
Instructor Guide
Uempty When enabling shdaemon, shconf performs the following steps:

- Modifies the SWservAt parameters
- Starts shdaemon
- Modifies /etc/inittab so that shdaemon will be started on each system boot
Action attributes
Each action has its own attributes, which set the priority and timeout thresholds and
define the action to be taken. The timeout attribute unit of measure is in minutes.
Example
By changing the shconf attributes, we can enable, disable, and modify the behavior of
the facility. For example:, shdaemon is enabled to monitor process priority
(sh_pp=enable), and the following actions are enabled:
- Enable the to monitor process priority monitoring:
# shconf -l prio -a sh_pp=enable
- Log error in the error logging:
# shconf -l prio -a pp_errlog=enable
Every two minutes (pp_eto=2), shdaemon will check to see if any process has been
run with a process priority number greater than 60 (pp_eprio=60). If not, shdaemon
logs an error to the error log.
- Display a warning message on a console:
# shconf -l prio -a pp_warning=enable (default value)
Every two minutes (pp_wto=2), shdaemon will check to see if any process has been
run with a process priority number greater than 60 (pp_wprio=60). If not, shdaemon
sends a warning message to the console specified by pp_wterm.
- Launch a command:
# shconf -l prio -a pp_cmd=enable -a pp_cto=5
Every five minutes (pp_cto=5), shdaemon will check to see if any process has been run
with a process priority number greater than 60 (pp_cprio=60). If not, shdaemon runs the
command specified by pp_cpath (in this case, /home/unhang).

Instructor Guide
Instructor notes:
Purpose — Describe how shdaemon is configured.
Details —
Additional information — shdaemon also supports lost I/O detection.
Transition statement — For an ILO (Instructor Lead On-line) class: In place of this
checkpoint visual you should play file AN152U03F21.
- a.To access the multimedia library click on the CD button along the toolbar in
Elluminate.
- b.Once the multimedia library window is open, select AN152U03F21 and click on
play.
- c.Ask the students to indicate via green checkmark when they have finished the file.
For a FTF (Face to Face) class: Play file AN152U03F21 and ask the students to call out
answers to the questions on the screen.

V7.0.1
Instructor Guide
Uempty
Checkpoint (1 of 2)
IBM Power Systems
1. Which command generates error reports?
2. Which flag of this command is used to generate a detailed error

report?
3. Which type of disk error indicates bad blocks?
4. What does the errclear command do?
Figure 3-20. Checkpoint (1 of 2) AN152.2
Notes:

Instructor Guide
Instructor notes:
Purpose — Present the checkpoint questions.
Details — A “Checkpoint Solution” is given below:
Checkpoint solutions (1 of 2)
IBM Power Systems

The answer is the errpt command.

report?
The answer is errpt–a generates a detailed report.

The answer is DISK_ERR4.

The answer is it clears entries from the error log.

V7.0.1
Instructor Guide
Uempty
Checkpoint (2 of 2)
IBM Power Systems
5. What does the errlogger command do?
6. What does the following line in /etc/syslog.conf indicate?

*.debug errlog
7. What does the descriptor en_method in errnotify indicate?
Notes:

Instructor Guide
Instructor notes:
Purpose —
Details —
IBM Power Systems

The answer is it is used by root to add entries into the error log.

*.debug errlog
The answer is all syslogd entries are directed to the error log.

The answer is it specifies a program or command to be run when
an error matching the selection criteria is logged.

Transition statement — Let’s do an exercise to reinforce what we have discussed.

V7.0.1
Instructor Guide
Uempty
Exercise: Error monitoring (part 2)

IBM Power Systems
Part 2, section 1: Work with syslogd
Part 2, section 2: Perform error notification

with errnotify
Figure 3-22. Exercise: Error monitoring (part 2) AN152.2
Notes:

Instructor Guide
Instructor notes:
Purpose —
Details —

V7.0.1
Instructor Guide
Uempty
Unit summary
IBM Power Systems

Analyze error log entries
Identify and maintain the error logging components
Describe different error notification methods
Log system messages using the syslogd daemon
Monitor and take actions for hang conditions using shdaemon
Notes:

Instructor Guide
Instructor notes:
Purpose — Summarize the unit.
Details — Present the highlights from the unit.
Before continuing to the next unit stop and ask the students if there are any additional
questions before continuing.

V7.0.1
Instructor Guide
Uempty Unit 4. Network Installation Manager basics
Estimated time
01:00

This unit provides an introduction to using the Network Installation
Manager (NIM) to network boot an AIX client system. It covers the
basic installation and configuration of NIM for supporting client
installation or booting to maintenance mode.

• Configure an AIX partition for use as a NIM master
• Set up NIM to support the installation of AIX onto a client

Accountability:
• Checkpoint
• Machine exercises
References
SC23-6616 AIX Version 7.1 Installation and migration
SG24-7296 NIM from A to Z in AIX 5L (Redbook)
http://www.redbooks.ibm.com IBM Redbooks
© Copyright IBM Corp. 2009, 2012 Unit 4. Network Installation Manager basics 4-1
Instructor Guide
Unit objectives
IBM Power Systems

Configure an AIX partition for use as a NIM master
Set up NIM to support the installation of AIX onto a client
Notes:

V7.0.1
Instructor Guide

Purpose — Cover unit objectives.
Details —
Transition statement — Let’s start with an overview of NIM function.
Instructor Guide
NIM overview
IBM Power Systems
AIX software administration

over the network:
Install
Update
Maintain
NIM master
Eliminate tape or CD at each and
NIM server
system
Distribute installation load PUSH installation: PULL installation:
Initiated by master Requested by
Support for push or pull client
installations
NIM administrative tools
Command line interface
SMIT Client and
Client Client
NIM server
Figure 4-2. NIM overview AN152.2
Notes:
Purpose of NIM
NIM provides centralized AIX software administration for multiple machines over the
network. NIM supports full AIX operating system installation as well as installing or
updating individual packages and performing software maintenance.
Advantages
NIM provides several advantages:
- Provides one central point for AIX software administration for all the NIM clients
- Eliminates need to walk a CDROM or tape to each system and the need for a tape
drive or CDROM drive at every system
- Installations can be initiated from the master machine (push) or from the client (pull)

V7.0.1
Instructor Guide
Uempty - The installation load can be distributed. Most simply, the NIM master machine is
configured as the server for all the filesets to be installed. However, you can also
configure one or more client machines to act as servers to distribute the load if you
have many clients.
NIM administrative tools

There are different ways you can manage your NIM environment:
- Command line
The command line gives you complete control, but the number of options needed
can be somewhat daunting. Still, if you want to script NIM operations, you must use
the command line. The basic NIM commands are:
• nimconfig - Configure NIM master
• nim - Perform NIM operations from the master
• nimclient - Perform NIM operations from a client
• niminit - Configure NIM client
• lsnim - List information about NIM objects
- SMIT
There are basically two paths into SMIT’s NIM interface:
• smit nim - Configure master and client machines and perform all NIM
operations.
• smit eznim - This provides a simplified environment to configure machines and
perform some basic NIM operations. This may be a good starting point for a new
NIM system administrator.
As you become familiar with the NIM environment, you may find that you use a
combination of methods. For example, you may use the command line to list NIM status
and perform simple NIM operations, while using SMIT for more complex operations or
for operations that you do not perform frequently.
Instructor Guide
Instructor notes:
Purpose — Provide an overview of NIM function.
Details —
Transition statement — Let’s take a closer look at the three roles illustrated in the
overview.

V7.0.1
Instructor Guide
Uempty
Machine roles
IBM Power Systems
Master
File sets:
bos.sysmgt.nim.master
bos.sysmgt.nim.client
Stores NIM database
NIM administration
Can initiate push installations to NIM clients
AIX version >= all other NIM machines
Client
File sets:
bos.sysmgt.nim.client
Can initiate pull installations from a server
Server
Any machine, master or client
Serves NIM resources to clients, thus requires adequate disk space and
throughput
Figure 4-3. Machine roles AN152.2
Notes:
There are three basic roles that a machine can assume in the NIM environment: master,
client, and resource server. There can only be one master machine in a NIM
environment, all other machines are clients. Any machine, master or client, can be a
resource server.
NIM software
All machines in the NIM environment must install bos.sysmgt.nim.client. The master
machine must also install bos.sysmgt.nim.master and bos.sysmgt.nim.spot.
Master
The NIM master manages all other machines that participate in the NIM environment.
The NIM database is stored on the NIM master. The NIM master is fundamental for all
of the operations in the NIM environment and must be set up and operational before
Instructor Guide
performing any NIM operations. The master can initiate a software installation to a
client, which is called a push installation.
Also, the NIM master is the only machine that is given the permissions and ability to
execute NIM operations on other machines within the NIM environment. The rsh or
nimsh commands are used to remotely execute commands on clients which allows the
NIM master to install to a number of clients with one NIM operation.
The master requires the filesets of bos.sysmgt.nim.master and
bos.sysmgt.nim.client. It is also required to have its AIX operating system software at
a level which is equal to or higher than any of the clients that it is serving.
Client
All other machines in a NIM environment are clients. Clients can request a software
installation from a server machine (pull installation). The client requires the fileset of
bos.sysmgt.nim.client.
Server
Any machine, the master or a client, can be configured by the master as a server for a
particular software resource. Most often, the master is also the server. However, if your
environment has many nodes or consists of a complex network environment, you may
want to configure some nodes to act as servers to improve installation performance.
Servers must have adequate disk space for the resources they will be providing. They
also need network connections to the client machines they serve and sufficient
bandwidth to respond to the expected volume.

V7.0.1
Instructor Guide

Purpose — Explain machine roles in more detail.
Details —
Transition statement — To better understand how NIM manages a network installation, it
is useful to first review the components of a regular installation from tape or optical media.
Instructor Guide
Boot process for AIX installation: Tape or CD

IBM Power Systems
1 Boot image is on
Load boot image
removable media
2 Execute boot image
Using programs
3 Configure devices on removable
media
Backup archive is
4 Install system files on removable
media
Figure 4-4. Boot process for AIX installation: Tape or CD AN152.2
Notes:
To understand how NIM works, we need to understand what happens when we install
AIX on a system. We start by reviewing what happens when we boot from CD or tape to
install AIX. Note how all of the programs and information is obtained from the
removable media.
Power on or partition activation

If using a POWER server with a single operating system, the machine must be booted
or reset in order to install the AIX Base Operating System (BOS). If using a server with
is logically partitioned, then you must activate the AIX partition from the HMC in order to
install an AIX BOS.

V7.0.1
Instructor Guide
Uempty Load boot image into memory

The machine's Initial Program Load (IPL) Read Only Memory (ROM) locates a boot
image and loads the image into memory. The boot image contains a miniature runtime
environment (the kernel and a file system containing libraries and key programs).
Where is the boot image?

When booting from a hard disk, the boot image is retrieved from the system's hard disk.
When a machine is being installed for the first time, it obviously cannot retrieve a boot
image from the hard disk. Traditionally, the boot image must would need to be available
on the tape or CD.
Transfer control to mini-runtime environment

Control is passed to the kernel, and the file system in the boot image is mounted from
memory.
Invoke boot script and configure devices needed for installation

The kernel initializes and eventually runs the boot script (rc.boot), which configures
devices that are needed for the installation such as keyboards, displays, and disks.
Configuring devices
In order to keep the boot image small, not all of the software needed to configure
devices is included in the boot image. These additional files are contained in a small
usr directory tree called a Shared Product Object Tree or SPOT. The boot script mounts
this usr directory tree on /SPOT in the memory file system. The SPOT is mounted
directly from the CDROM.
Note: Since tape devices do not support file system operations, the SPOT files are
included in the boot image in the case of booting from a tape drive.
Install script
Once the devices have been configured, rc.boot invokes the BOS installation program
(bi_main), and installs AIX from the installation images on the tape or CD.
Instructor Guide
Instructor notes:
Purpose — Review the flow and components of an AIX installation from tape or optical
media.
Details —
Transition statement — If we next look at how a network install is handled, we will see
that there are many similarities with a regular installation, of course with some significant
variations.

V7.0.1
Instructor Guide
Uempty
Boot process for AIX installation with NIM (1 of 2)

IBM Power Systems
1 Boot image from

Load boot image
NIM server
client NIM server

en0 bootp request bootpd
boot file name
/etc/bootptab
tftp boot file
boot image file
2 Execute boot image
Figure 4-5. Boot process for AIX installation with NIM (1 of 2) AN152.2
Notes:
Booting over the network, using NIM, is essentially the same as booting from CD or
tape, except that the boot file (SPOT file) and installation images come from the server
system over the network.
Load boot image into memory

If the client system is booting from the network, the IPL ROM sends (using a bootp
request) a request to the NIM server for the name of a boot file. The NIM server then
uses the /etc/bootptab file to determine the boot file name and returns that name to the
client system. Finally the client system requests the NIM server (using the tftp
command) to download the boot file over the network.
Instructor Guide
Instructor notes:
Purpose — Provide a description of the components and flow of a network installation
using NIM.
Details —
Transition statement — Let us continue with our comparison of using removable media
versus using a NIM server.

V7.0.1
Instructor Guide
Uempty
Boot process for AIX installation with NIM (2 of 2)

IBM Power Systems
Using programs
3 Configure devices
on NIM server
client NIM server

NFS mount of SPOT
en0 spot: ./usr
access programs
directory tree
Backup archive is
4 Install system files
on NIM server
client NIM server

mount of lppsource
en0 lppsource:
access backup archives
filesets
Figure 4-6. Boot process for AIX installation with NIM (2 of 2) AN152.2
Notes:
Invoke the boot script and configure devices needed for installation
When booting over the network, the SPOT is mounted from the NIM server using the
Network File System (NFS).
Invoke install script

When booting over the network, the install script installs AIX using installation images
which are NFS mounted from the NIM server.
Instructor Guide
Instructor notes:
Purpose — Continue description of installing AIX from a NIM server.
Details —
Transition statement — In order for NIM to manage this install process, it needs to have
objects that describe the machines and resources involved. Let’s take a high level look at
what these are.

V7.0.1
Instructor Guide
Uempty
NIM objects
IBM Power Systems
NIM objects stored in ODM
Object classes
Re
s
ork
sou
Networks
tw
rce
Ne
Machines
s
Resources
Machines
Group objects
mac_group
res_group
Figure 4-7. NIM objects AN152.2
Notes:
NIM is made up of various components, called objects. There are three classes of
objects: machines, networks, and resources.
All information about the NIM environment is stored in Object Data Manager (ODM)
databases on the NIM master system.
Network objects
Network objects are objects in the NIM database that represent information about each
Local Area Network (LAN) that is part of the NIM environment. These objects and some
of their attributes reflect the physical characteristics of the network. NIM network objects
are not used to perform management tasks in the overall network environment; they are
only used to represent the physical network topology of the NIM environment. In other
words, if something changes in the physical network environment, you must remember
to make the change in the NIM database as well.
Instructor Guide
There are five types of networks supported by NIM: Token-Ring, Ethernet, ATM, FDDI,
and generic. These network types are represented as network objects in the NIM
environment.
Machine objects
Machines in the NIM environment are simply the machines that will be managed by
NIM.
Resource objects
All operations on clients in the NIM environment require one or more NIM resources.
NIM resource objects represent the files, directories, and devices that are used in order
to support each type of NIM operation. Some resources are AIX filesets (or devices
which contain filesets) that can be installed on a client machine. Other resources are
scripts or configuration files that are used in the installation process.
The location and other attributes for these resources are stored as resource objects in
the NIM database.
Group objects
NIM supports two types of group objects:
- mac_group
A machine group is a group of machine objects. You can use a machine group to
simplify performing a NIM operation on multiple machines.
- res_group
A resource group is a group of resource objects. If you have a set of resources that
you typically want to use at the same time, you can create a resource group to
simplify allocating those resources.

V7.0.1
Instructor Guide

Purpose — Describe the NIM objects.
Details —
Transition statement — It is useful to be able to list the existing defined objects and their
attributes. Let’s look at the lsnim command that provides this information. Then we will
explain the meaning and use of the displayed attributes for each type of object. Later, we
will cover how to create these objects.
Elluminate.
play.
visual as well.
Instructor Guide
Listing NIM objects and their attributes

IBM Power Systems
To list all defined NIM objects

# lsnim
master machines master
boot resources boot
nim_script resources nim_script
ent0 networks ent
...
To list attributes of a NIM object

# lsnim -l <object_name>
# lsnim –l ent0
ent0:
class = networks
type = ent
Nstate = ready for use
prev_state = information is missing from this object's definition
net_addr = 10.31.192.0
snm = 255.255.240.0
routing1 = default 10.31.192.1
Figure 4-8. Listing NIM objects and their attributes AN152.2
Notes:
The lsnim command is used to list various types of NIM information. You have the
opportunity to experiment with lsnim in the exercise.
Listing objects and attributes

When used without any argument, lsnim displays all the currently defined NIM objects.
Using the -l flag, you can get a long listing of an individual object.

V7.0.1
Instructor Guide

Purpose — Explain how to use the lsnim command to display objects and their attributes.
Details — Keep the focus on these uses of lsnim. The listing of all NIM objects and the
listing of attributes for a particular object are the two most common uses of lsnim. The
other lsnim options are better left to the NIM course.
Transition statement — We will now discuss the various NIM objects in the context of
configuring NIM. Let’s start with a summary of the basic NIM configuration procedure.
Instructor Guide
NIM configuration
IBM Power Systems
Configure master
Install master NIM file sets
Run nimconfig
Define resources
Create real resource with full path
Create resource object to represent
Define networks
How do clients on networks access the master
Define clients
Able to relate network address of the client with object name
Allocate resources to clients
Different operations need different resources
NIM operations on clients
Setting up for operation
Initiating operation
Figure 4-9. NIM configuration AN152.2
Notes:
Installing NIM
The NIM filesets that need to be installed on a machine designated to act as NIM
master are:
- bos.sysmgt.nim.client
- bos.sysmgt.nim.master
- bos.sysmgt.nim.spot
Configure master
Configuring the master machine consists of installing the master filesets and running
nimconfig. You must specify the primary network interface and a NIM network name
for the network which is attached to the primary interface. There are several optional
attributes which can be specified.

V7.0.1
Instructor Guide
Uempty nimconfig creates the NIM database and the /etc/niminfo configuration file. It also
starts the NIM daemon (nimesis) and creates an entry in /etc/inittab so that nimesis is
started on every boot of the master machine.
Create NIM objects

Next you need to create the NIM objects:
- resources
Specify the directories and files needed by NIM.
- networks
You have already defined the master’s primary network (nimconfig). If some of your
clients are connected to separate networks or subnets, you need to define these
networks and routes for the master to communicate with all the clients and routes for
any servers to communicate with their clients.
- clients
Specify the client machines you are installing using NIM.
Allocate resources
Once the resource and machine objects are defined, you need to decide what operation
you want to perform on your client machine. For each operation, there are different
resources needed.
Next, you need to allocate the resource to your client. This identifies which resource
object will be used to implement the client operation. There are two ways in which this is
done:
- Use the nim -o allocate operation (or equivalent SMIT dialog) to relate the
resource to the machine
- Use a SMIT dialog which prompts for the resources to allocate as part of the
machine operation definition
Perform the operation on the client

There are many different operations that you might perform on a client. You might install
an operating system, install maintenance, provide support for a maintenance boot or a
diagnostic boot, and more.
There are usually two phases related to an operation:
- The NIM setup in which the NIM server is configured to support the task you want to
perform on the client
- The initiation of that task
Instructor Guide
The task can be initiated from the client; or, provided that the client machine has already
been configured as a NIM client, the NIM master can initiate the task.

V7.0.1
Instructor Guide

Purpose — Provide a high level look at the NIM configuration steps.
Details — This is mainly a menu of topics covered on the following slides, except for the
fileset installation and nimconfig execution. So, cover those with this visual, but just cover
the rest at a high level to provide an understanding of the sequence.
Additional information — For resources, note that there are special location
requirements when installing High Availability Management Server (HA MS), an optional
feature of Cluster System Management (CSM).
Transition statement — As you can see, once we have configured the NIM master, much
of the work with NIM is the definition of the machines and resources. Let’s take a closer
look at what needs to be defined and how that is done, beginning with the NIM resources.
Instructor Guide
Resources objects
IBM Power Systems
Object types
boot Represents the network boot image resource
nim_script Directory for customization scripts created by NIM
spot Shared Product Object Tree - equivalent to /usr file system
lpp_source Source device for software product images
bosinst_data Config file used during base system installation
image_data Config file used during base system installation
mksysb A mksysb image
script A user created script which is executed on a client to perform
customization
resolv_conf Configuration file for name-server information
... (additional resource types)
Attributes
location Directory path
server Machine which serves this resource
Rstate,
prev_state Status attributes
... (additional attributes)
Figure 4-10. Resources objects AN152.2
Notes:
Resources are the files and directories that NIM uses to install software on the clients.
Resource types
Resource types identify the different types of files used by NIM. For example:
- An lpp_source resource is a directory containing product images to be installed
- A spot resource contains the files used during the boot operation
- A script resource is a user definable script which can be used to perform
customization on a newly installed client
- A mksysb resource is a backup image that can be used to install a client

V7.0.1
Instructor Guide
Uempty Resource attributes

Attributes for resources identify where the resource can be found, its status, and so
forth:
- location defines the directory path to the resource
- server identifies which machine serves the resource
- Rstate indicates whether a resource is available for clients to use
- prev_state indicate the previous value of Rstate
Additional resource types and attributes

There are a number of different resource types, each having its own set of attributes.
lsnim is probably the easiest way to get information about NIM attributes.
Instructor Guide
Instructor notes:
Purpose — Cover resources objects and their attributes.
Details — Note the variety of resources and that the attributes basically map between the
resource name and the location of the file or directory that contains that resource. Be
careful not to pre-teach the details on resources covered on later visuals, such as
lpp_source, spot, or mksysb. These are covered after the discussion of operations, so
they can be discussed in the context of those operations (in particular, the bos_inst
operation).
Transition statement — Let’s take a closer look at the resource types that we will need to
define to support a NIM installation of an AIX operating system, starting with the
lpp_source.
Elluminate.
play.
visual as well.

V7.0.1
Instructor Guide
Uempty
Resources objects: lpp_source

IBM Power Systems
lpp_source
Directory containing software product images
Supports NIM install operations (bos_inst and cust)
Also used for creation of spot resource py o
e nc
g
Defining an lpp_source:
# nim -o define -t lpp_source
-a server=<machine>
-a location=<directory>
lppsource
[ optional attributes ]
<lppsource_name>
aix61-00-00 aix61-01-00
# smit nim_mkres
bos filesets
Figure 4-11. Resources objects: lpp_source AN152.2
Notes:
lpp_source
When a resource of this type is defined, it represents a directory in which software
product images are stored. lpp_source resources are used to support NIM install
operations. An lpp_source can also be used as the source for the creation of a SPOT.
When you perform a NIM install operation and have allocated an lpp_source resource
to the client, NIM NFS mounts the lpp_source directory on the client, and then invokes
the installp command on the client to install from the directory. When installp
finishes, NIM automatically unmounts the resource.
simages attribute
This attribute is used to indicate that an lpp_source resource contains the set of
installable images to which NIM requires access to perform its basic functionality. This
Instructor Guide
basic set of images is referred to as support images or simages. NIM automatically

manages the use of this attribute as part of the management of an lpp_source.
NIM adds this attribute to the definition of an lpp_source when it provides the required
simages, and NIM removes this attribute from the object's definition if a required image
becomes unavailable.
Some NIM operations require access to an lpp_source that has this attribute as part of
its definition, so having this attribute can be important. Perform the check operation on
the lpp_source to have NIM check to see whether the simages requirement has been
fulfilled. If it has, NIM adds this attribute to the lpp_source definition.
Defining an lpp_source resource

You can use the command line or SMIT to define an lpp_source.
The visual shows how the required attributes would be specified on the command line.
Required attributes are:
- server=<machine>
NIM name for the machine which serves this resource
- location=<directory>
Directory where the lpp_source files are located
There are a number of optional attributes, including:
- source=<directory>
If you already have a directory that contains the software images, the source
attribute is not required. If you want NIM to create a directory and populate it for you,
the source attribute specifies the directory or device which contains the software
images to be copied into the lpp_source directory.
- packages=<package_list>
Use the packages attribute if you only want NIM to copy specific packages from the
source.
The final argument is the name of the NIM object:
- <lppsource_name>
The last argument on the nim command line is the name of the object you are
operating on, in this case, the name of the lpp_source resource we are creating.
Additional lpp_source information:
- If you add or remove an installable image from the lpp_source, perform the check
operation on that object so that NIM rebuilds the .toc (table of contents) file, which
resides in the lpp_source directory. This is important, as the installp command
uses the .toc to determine which images are available.

V7.0.1
Instructor Guide
Uempty - Starting at AIX 5L Version 5.3, there is an update operation, which allows you to
update an lpp_source resource by adding and removing packages. Previously, you
could copy packages into an lpp_source directory or remove packages from an
lpp_source directory and run nim -o check to update the lpp_source attributes.
Previously, SMIT allowed you to add packages to an lpp_source through the
smit nim_bffcreate fast path. However, this SMIT function does not check to see
if the lpp_source is allocated or locked, nor does it update the simages attribute
when finished. The update operation has been created to address this situation.
Instructor Guide
Instructor notes:
Purpose — Cover the definition of the lpp_source.
Details —
Transition statement — Once we have an lpp_source, we next need to use the
lpp_source to generate a matching SPOT. Let’s look at how that is done.
Elluminate.
- b. Once the multimedia library window is open, select AN152U04F17 and click on
play.
- c. Ask the students to indicate via green checkmark when they have finished the file.
visual as well.

V7.0.1
Instructor Guide
Uempty
Resources objects: spot

IBM Power Systems
spot
/usr directory tree used during network boot lppsource
Matching network boot images generated:
-/tftpboot/<spot_name>.<Platform>.<Kernel>.<Network>
Defining a SPOT
# nim -o define -t spot
-a server=<machine>
spot
-a location=<directory>
-a source=<lpp_source_name>
[ optional attributes ] spot61-00-00 spot61-01-00
<spot_name>
usr
•# smit nim_mkres
bin include lib etc
Figure 4-12. Resources objects: spot AN152.2
Notes:
Components
• A /usr file system
A Shared Product Object Tree (SPOT) is a directory containing AIX code that is
equivalent in content to the code that resides in a /usr file system on a system running
AIX. The NIM SPOT creation process restores files from AIX filesets into the directory in
which the SPOT resides.
The SPOT is NFS-mounted on a booting client to provide necessary device support for
the boot process.
• Boot image
As part of the creation of a SPOT resource, NIM also creates network boot images. The
network boot images are constructed in /tftpboot on the same machine in which the
SPOT is created. The boot images are constructed with code from the newly created
Instructor Guide
SPOT. The boot images are also sometimes called spot files. The boot image file is
transferred to the client system using the BOOTP protocol.
Since one SPOT can potentially support several types of machines, several boot image
files may be created. The naming convention identifies each boot image as:
<spot_name>.<Platform>.<Kernel>.<Network>, where:
- <Platform> identifies which architecture this boot image supports: chrp, rspc, and
so forth
- <Kernel> specifies whether this boot image contains a multi-processor (mp) or
uni-processor (up) kernel.
- <Network> identifies the network type: ent, tok, and so forth
These days, the only combination most of us work with is: chrp.mp.ent.
During a network boot, the boot image is transferred over the network and loaded into
the client’s memory.
• /tftpboot
It is good practice to make /tftpboot be a separate file system. This removes the
risk of filling the root file system. If you are supporting multiple AIX versions on
multiple machine types or multiple network types, this directory can get quite large.
Defining a SPOT resource on the command line

The visual shows the nim syntax to define a SPOT. The -t flag identifies the type of
object you wish to define. In addition, you must specify the following required attributes:
- server=<machine>
NIM name for the machine which serves this resource.
- location=<directory>
Directory (on the server) where the SPOT files are located.
- source=<lpp_source_name>
This attributes points to the location of the files used to create the SPOT resource.
This can be an existing lpp_source resource, a device name (for example:
/dev/cd0) or a directory which contains the source filesets used to create the SPOT.
Most commonly, the lpp_source resource is created first and then the spot is
created from the lpp_source.
- <spot_name>
The last argument on the nim command line is the name of the object you are
operating on, in this case, the name of the SPOT resource we are creating.

V7.0.1
Instructor Guide
Uempty Optional attributes

There can be a number of optional attributes, including:
- installp_flags=<flags>
NIM calls installp to create the SPOT. By default, NIM uses the -agX flags when
calling installp. You can use installp_flags to specify the options you require.
- auto_expand={yes|no}
Indicates that file systems should be automatically expanded if additional space is
needed.
Defining a SPOT using SMIT

The visual shows the SMIT fast path for defining resource objects. SMIT opens with a
window that allows you to select which type of resource you want to define. Once you
select a resource type, SMIT opens a window with the necessary fields to specify the
resources and attributes for that type of object, in this case, a SPOT.
Instructor Guide
Instructor notes:
Purpose — Cover how to define a SPOT.
Details —
Transition statement — While we can use an lpp_source and matching SPOT to install a
new operating system, quite often the network installs are actually recoveries of mksysb
images. This is either to recover a lost rootvg or to clone an AIX image to other machines
or LPARs. Let’s see how we define a mksysb resource.

V7.0.1
Instructor Guide
Uempty
Resources objects: mksysb

IBM Power Systems
mksysb
Identifies a mksysb system backup image file
Used for bos_inst operations
Defining a mksysb
# nim -o define -t mksysb
-a server=<machine>
-a location=<mksysb_path>
[ optional attributes ]
<mksysb_name>
• # smit nim_mkres
Figure 4-13. Resources objects: mksysb AN152.2
Notes:
mksysb
A mksysb resource represents a system backup image file created using the mksysb
command. A mksysb resource can be used as the source of the BOS run-time files
when a bos_inst is performed.
Defining a mksysb resource

You can use the command line or SMIT to define a mksysb. You can use an existing
mksysb image, or you can have nim create one for you. (nim calls mksysb to create the
new backup.)
Required attributes are:
- server=<machine>
NIM name for the machine which serves this resource
Instructor Guide
- location=<mksysb_path>
If the system backup image already exists, enter the name of the file where the
image resides. If you are creating the system backup image as part of this operation,
enter the name of the file where you want the image placed after it is created.
There are a number of optional attributes, including:
- mk_image={yes|no}
If the backup file already exists, specify no (the default). If you want nim to create a
new backup file, specify yes.
- source=<machine_name>
If you want nim to create a backup image for you, specify the NIM name of the
machine you want to back up.
- mksysb_flags=<value>
You can use this attribute to specify optional flags for the mksysb command, if
needed.

V7.0.1
Instructor Guide

Purpose — Cover the definition of a mksysb resource.
Details —
Transition statement — Once we have our resources defined, we next need to define the
machines we want to manage and the networks to which they are connected. Let’s start
with the networks object.
Instructor Guide
Networks objects
IBM Power Systems
Object types
ent Ethernet network
fddi FDDI network
tok Token ring network
atm ATM network (no network boot capability)
generic Generic network (no network boot capability)
Attributes
net_addr Network address for a network
snm Subnetmask for a network
routing<X> Routing information for a network
Nstate,
prev_state Status attributes
... (Additional attributes)
master router client

Figure 4-14. Networks objects AN152.2
Notes:
In order to perform certain NIM operations, the NIM master must be able to supply
information necessary to configure client network interfaces. The NIM master must also
be able to verify that client machines can access all the resources provided by the NIM
server. To avoid the overhead of repeatedly specifying network information for each
individual client, NIM network objects are used to represent the networks in a NIM
environment.
Network types
NIM supports the four network types shown in the visual, plus a generic type. Network
boot support is provided for Ethernet, Token-Ring, and FDDI. Network boot operations
are not supported on ATM or generic networks. NIM supports both standard Ethernet
and IEEE 802.3 Ethernet networks.

V7.0.1
Instructor Guide
Uempty Network attributes

Network attributes include the network address, subnet mask, routes, and status. The
Nstate attribute indicates whether the object definition of the network is complete. NIM
requires that all networks be able to communicate with the NIM master, either by the
master being directly connected to them or by having a NIM route to a network to which
the master connects.
Routing
NIM routing information represents standard TCP/IP routing information for the
networks that are part of a NIM environment. This information defines the gateways that
are used to establish communication between the master machine and the clients.
The routing<X> attribute defines a route and includes:
- A destination (default or a NIM network name)
- A gateway address
If needed, multiple routes can be created and are numbered routing1, routing2, and
so forth.
Additional attributes
There are a number of other attributes for each network object. lsnim is probably the
easiest way to get information about NIM attributes.
Other network information

The ring_speed (for token-ring) and cable_type (for Ethernet) are not attributes of the
network objects, they are attributes of the machine objects.
Instructor Guide
Instructor notes:
Purpose — Cover networks objects and their attributes.
Details — Point out that we do not usually define a network object directly. Instead, the
information provided when defining a machine is used to either match to an existing
network object or to create a new network object. The most important point to make is that
the networking information is from the perspective of the machine being defined. In the
network diagram shown in the visual, when defining the client, it is the router interface
which is in the network to the right that needs to be defined as the gateway. The network
option is defining how the client would network boot in order to send a bootp request to the
NIM server.
Additional information — Unlike other network adapters, ATM adapters cannot be used
to boot a machine. This means that installing a machine over an ATM network requires
special processing (refer to the AIX Installation Guide and Reference, Chapter 20. Basic
NIM Operations and Configuration for instructions). The generic network type is used to
represent all other network types where network boot support is not available. For clients
on generic networks, NIM operations that require a network boot, such as bos_inst and
diag, are not supported. However, non-booting operations, such as cust and maint, are
allowed.
Transition statement — Next, let’s look at the machines object.

V7.0.1
Instructor Guide
Uempty
Machines objects
IBM Power Systems
Object types
master
standalone
diskless Master
dataless
Attributes
platform Architecture Standalone
netboot_kernel up or mp
if<X> Network interface information
serves Resource served by this machine
Cstate, Diskless
prev_state,
Mstate Status attributes
... (additional attributes)
Dataless
Figure 4-15. Machines objects AN152.2
Notes:
NIM supports four types of machines: the master type and three types of clients:
standalone, diskless, and dataless.
Master
The master machine is defined by installing the master fileset, and then performing
some quick configuration. There can only be one master in the NIM environment. Once
a machine is defined as the master, it can participate in NIM operations.
Standalone clients
Standalone clients have local disk resources. They are installed from the NIM server,
but once installed, they boot and operate from their local disks.
Instructor Guide
Diskless clients
Diskless clients have no disks of their own. They run entirely using resources from the
NIM server.
Dataless clients
Dataless machines can only use a local disk for paging space and the /tmp and /home
file systems. All of the other storage is provided over the network by the NIM server.
Machine attributes
Each machine object belongs to one of the four machines’ object classes. Additionally,
machine objects store other attributes about the machine. The visual shows a few of
them:
- The platform attribute describes the machine architecture (chrp, rspc, and so
forth).
- netboot_kernel indicates which type of kernel is required, uni-processor (up) or
multi-processor (mp).
- if<X> is used to provide information about a machine’s network interfaces. If there
are multiple interfaces, they are numbered: if1, if2, and so forth. This attribute
includes the NIM network this interface connects to, the host name, the MAC
address, and the network type.
- The serves attribute identifies resources that are served by this machine. If the
machine serves several resources, there will be a serves attribute for each
resource.
- Cstate indicates the NIM operation that is currently being performed on a machine
or that no NIM operations are currently being performed.
- prev_state shows the previous Cstate.
- Mstate shows the execution state for a machine.
Note: NIM attempts to keep the value of this attribute synchronized with the
machine's execution state, but NIM does not guarantee its accuracy. Perform the
check operation on the machine for NIM to attempt to determine the machine's
execution state.
Additional attributes
There are a number of other attributes for each machine object. lsnim is probably the
easiest way to get information about NIM attributes.

V7.0.1
Instructor Guide

Purpose — Explain machine objects and their attributes.
Details — Keep the focus on standalone and master. Diskless and dataless is rarely used
these days; only provide the briefest of definition for them and move on.
Transition statement — Let’s discuss how we would define a machine object, first from
the command line, and then using SMIT.
Instructor Guide
Defining a machine object

IBM Power Systems
• # nim -o define -t standalone -a platform=<PlatformType>

-a netboot_kernel=<NetbootKernelType>
-a if1=<InterfaceDescription>
-a net_definition=<DefinitionName>
-a cable_type1=<TypeValue>
<MachineName>
Examples:
# nim -o define -t standalone -a if1="network1 lpar1 0 ent0"
-a cable_type1="N/A" -a connect=nimsh
-a platform=chrp -a netboot_kernel=mp lpar1
# smit nim
Perform NIM Administrative Tasks
Manage Machines
Define a Machine
<provide hostname of client>
Figure 4-16. Defining a machine object AN152.2
Notes:
Follow these steps to add a client with the network information using SMIT:
1. On the NIM master, add a standalone client to the NIM environment by using
SMIT (nim_mkmac is the fast path).
2. Specify the host name of the client.
This is the name translation of the IP address of the install adapter of this
machine. By default, this also becomes the hostname of this client when the
client is installed. If using DNS, enter in the long host name here. For example,
lpar1.my.company.com.
3. The next SMIT screen displayed depends on whether NIM already has
information about the client's network. Supply the values for the required fields or
accept the defaults. Use the help information and the LIST option to help you
specify the correct values to add the client machine.

V7.0.1
Instructor Guide
Uempty For example using nim, the command line might look like:
# nim -o define -t standalone -a if1="net1 lpar1 0 ent0"
The if1 quoted value in the example has multiple space delimited fields as follows:
• net1 is the network object name
• lpar1 is the hostname
• 0 is the place holder for the mac address
• ent0 is the physical adapter used by the client to reach the master
If using SMIT, the sequence of menu items to the matching dialog panel would be:
# smit nim
Perform NIM Administrative Tasks
Manage Machines
Define a Machine
<provide hostname of client>
The resulting dialogue panel is shown in the next visual.
Instructor Guide
Instructor notes:
Purpose — Cover how to define standalone machines.
Details —
Transition statement — An easy way to define a machine is to use SMIT. The visual
shows the SMIT menu path to use, but let’s look at the resulting dialog panel.
- a .To access the multimedia library click on the CD button along the toolbar in
Elluminate.
play.
visual as well.

V7.0.1
Instructor Guide
Uempty
Define a client using SMIT

IBM Power Systems
Define a Machine
* NIM Machine Name [lpar1]

* Machine Type [standalone] +
* Hardware Platform Type [chrp] +
Kernel to use for Network Boot [mp] +
Communication Protocol used by client [nimsh] +
Primary Network Install Interface
* Cable Type N/A +
Network Speed Setting [] +
Network Duplex Setting [] +
* NIM Network network1
* Host Name lpar1
Network Adapter Hardware Address [0]
Network Adapter Logical Device Name [ent0]
IPL ROM Emulation Device [] +/
CPU Id []
Machine Group [] +
Comments []
Figure 4-17. Define a client using SMIT AN152.2
Notes:
NIM machine name/hostname

There are two names given to your client: a NIM name and a hostname. The NIM name
is what is used when performing operations on this client. The hostname becomes the
system-wide hostname of this client and is also the name associated with the client's
adapter that NIM uses to do the client install. In our case, we used a short name on the
prior panel. Hence, the NIM name and hostname are identical. If we had used a long
name on the prior panel, then we would see the long name for the hostname and the
short name for the NIM Name. For example, if we put lpar1.my.company.com on the
prior panel, then the hostname would be lpar1.my.company.com and the NIM name
would be lpar1.
Machine type
Only one client machine type is used anymore - standalone.
Instructor Guide
Hardware platform type

You can choose between chrp, rspc or the really old classical rs6k. Since the chrp
architecture came out in the mid 90s, most folks are using that today. If you want to
double check what architecture your client is using, run the command:
getconf -a | grep MACHINE_ARCHITECTURE.
On older AIX release levels, try the bootinfo -p command.
Kernel type
If a client machine is running the 64-bit kernel, then mp should be chosen. However, if
the client is running the 32-bit kernel, either the up or mp kernel may be chosen. To
determine what client is currently, run the ls -l /usr/lib/boot/unix command.
Notice whether it is linked to the 64 up or mp kernel in that same directory. Also the
getconf -a command can be run to determine if the machine is capable of running an
mp kernel. An MP_CAPABLE setting of 1 means yes. On older releases, run the
bootinfo -z command to find out if the machine can handle mp. A setting of 1 again
means yes. Starting with version 6.1, AIX only uses a 64 bit kernel.
Communication protocol
Either the less secure shell protocol (rsh) may be used or the newer (nimsh) protocol
(which is available in AIX 5L 5.3 and later versions of AIX).
Note: Each client can have a different setting.
Cable type
Most configurations today are set to N/A (not applicable), as modern adapters are
autosensing of the connection type, or only support a single type (such as twisted pair
or fiber).This can be double checked by running the lsattr -El entX command to
notice whether the cable_type field shows. If not, then setting to N/A should work. If
running twisted pair cable, then setting it to tp should work.
Network speed/duplex
These settings are only used when performing a push boot operation on the client. If not
set, the current SMS speed/duplex settings for your install adapter are used.
NIM network
This is the NIM network to which the client is assigned.

V7.0.1
Instructor Guide
Uempty Hardware address

This is the MAC address of the client. It is only needed for BOOTP broadcast
operations. This MAC address, if ever needed, can be retrieved by looking at your
client's Remote IPL SMS menus.
Logical device name

This is the name of NIC physical adapter over which you plan to install. For example, it
might be ent0 or ent1. This adapter receives the hostname you have set above on this
screen in the Host Name field when the client is installed.
IPL ROM emulation

This is only set for machines that do not support network boot. Please see online
documentation for details.
CPU_ID
This is the machine ID retrieved from running the uname command on the client. It will
be used to uniquely identify this client in the future. You do not have to set this, NIM will
configure this.
Machine group
You can assign a client to a machine group.
Command line
The equivalent NIM command for the above operation is:
# nim -o define -t standalone -a if1="network1 lpar1 0 ent0"
Use the lsnim -q define -t standalone command for more information or see
your nim man page.
Instructor Guide
Instructor notes:
Purpose — Cover how to use the SMIT dialog panel to define a machine.
Details —
Transition statement — Now that we have all of our objects defined, we only need to
relate what resources are used with a machine, and then set up the NIM support to a
particular operation. Let’s look at the various NIM operations and how they relate to the
resource allocations.

V7.0.1
Instructor Guide
Uempty
NIM operations
IBM Power Systems
Operations on clients
bos_inst
• rte
• mksysb
cust
maint
diag
maint_boot
Procedure
Allocate resources to clients (for intended operation)
Perform operation
Unallocate resources
Other NIM object operations

define, change, remove, allocate, deallocate, maint,
lslpp, lppchk, check, and so forth
Figure 4-18. NIM operations AN152.2
Notes:
Operations on clients
NIM supports several different types of operations to install and manage software on
NIM clients. In addition, there are operations to manage the NIM objects themselves.
For the purposes of this class, we are primarily interested in three client operations:
- bos_inst
Allows you to install AIX on a client
- cust and maint
Allows you to update and maintain AIX software
- maint_boot
Allows you to boot a client to maintenance mode over the network
Instructor Guide
bos_inst
A bos_inst operation is used to perform a Basic Operating System (BOS) installation
on a client. There are two types of bos_inst operations: rte and mksysb.
bos_inst: rte installations

An rte install instructs the BOS installation process to install AIX from the images in the
lpp_source resource specified for the operation.
The default bos_inst operation is rte (runtime environment).
bos_inst: mksysb installations

A mksysb bos_inst operation installs the client from a mksysb resource. A mksysb
resource is a system backup image created using the mksysb command (or the SMIT or
WebSM interfaces to the mksysb command).
Installing a system from backup reduces, and often eliminates, repetitive installation
and configuration tasks. For example, a backup installation can copy optional software
installed on the source system, in addition to the Base Operating System. The backup
image also transfers many user configuration settings.
If you have many clients with the same software configuration, you could use one
mksysb image as the source to install all of them.
bos_inst customization
The NIM installation process provides the ability to invoke a customization script after
AIX is installed on the system. This is done by allocating a script resource to the client
before performing the bos_inst. That script could be used to perform such
customization as setting passwords, changing network addresses, and so forth.
cust
This NIM operation performs software customization on a running NIM client. You can
use the cust operation to:
- Update existing software
- Install additional software
- Run a customization script
maint
This NIM operation performs software maintenance operations on clients, such as
committing applied software, removing software, and so forth.

V7.0.1
Instructor Guide
Uempty diag
This NIM operation enables the client to boot to diagnostics over the network.
maint_boot
This operation enables the client to boot to maintenance mode over the network.
Procedure for operations

In order to perform a NIM operation on a client machine, there are a number of steps
which must be performed:
1. Allocate the required resources to the client machine.
- This makes the resources available to the client. You can explicitly allocate
the resources before your perform the NIM operation, or you can allocate the
resources at the same time you perform the operation.
- Allocation usually involves NFS exporting the resource’s directory so the
client can NFS mount it over the network.
- The initial boot image is actually transferred using tftp. To provide this
network boot image, an entry is created in the /etc/bootptab file and files are
created in the /tftpboot directory.
2. Perform the operation.
3. Unallocate resources.
- While a resource is allocated to a client, the resource is locked to block any
changes. After the operation completes, the resources should be deallocated
from the machine so they can be freed again for updates or changes.
Other NIM object operations

In addition to operations which directly affect NIM clients, there are a number of NIM
operations used for managing NIM objects. In addition to the obvious (define, change,
remove, allocate and unallocate), you can also:
- Update or add software to a spot or lpp_source resource.
(cust operation)
- Perform software maintenance on a spot or lpp_source resource.
(maint operation)
- List LPP information in a resource.
(lslpp operation)
- Verify software packages in an spot or lpp_source resource.
(lppchk operation)
Instructor Guide
- Check the status of a NIM object.

(check operation) The actual tasks performed by the check operation differ
depending on which type of object you are operating on.

V7.0.1
Instructor Guide

Purpose — Cover the various NIM operations that can be performed on machines.
Details —
Transition statement — Let’s take a closer look at the most common NIM operation -
setting up for installation of an operating system.
Instructor Guide
bos_inst operation
IBM Power Systems
Command line
# nim -o bos_inst
-a lpp_source=<lpp_res_name>
-a spot=<spot_name>
-a source={rte|mksysb}
-a mksysb=<mksysb_name>
-a boot_client={yes|no}
[optional attributes]
<client_name>
• # smit nim_bosinst
Figure 4-19. bos_inst operation AN152.2
Notes:
bos_inst
Configuring NIM to perform a bos_inst can be done from the command line or through
SMIT. There are two steps: allocating resources to the client and enabling the
bos_inst. It is also possible to combine these steps into one command:
# nim -o bos_inst -a lpp_source=<lpp_res_name> -a spot=<spot_name>
[additional resources] [-a source={rte|mksysb} [additional attributes]
<client_name>
If you use SMIT to enable a bos_inst, SMIT opens a series of windows to prompt you
for the required information and then displays a window where you can set additional
optional attributes.

V7.0.1
Instructor Guide
Uempty Required information

The required information for a bos_inst operation is:
- <client_name>
As always, the last argument specifies the NIM object you want to operate on. In this
case, this is the target client machine that you wish to install.
- spot=<spot_name>
Specifies the SPOT resource you wish to use.
- lpp_source=<lpp_res_name>
This is the name of the lpp_source resource you wish to use for the installation. In
AIX 5L V5.3 and later, this attribute is not required for a mksysb install (see note
below).
Optional information
Optional attributes include:
- source={rte|mksysb}
mksysb=<mksysb_name>
If you do not specify the source attribute, nim performs a rte bos_inst. If you set
source=mksysb, then you must use the mksysb attribute to specify the name of the
mksysb resource you wish to use.
Note: In most cases, you must still include an lpp_source resource, even if you are
doing a mksysb install. With AIX 5L and later, if you have created a mksysb that
includes all devices, you do not need to specify an lpp_source.
- boot_client={yes|no}
When set to yes, the master attempts to reboot the client machine automatically for
reinstallation. For this option to succeed, the client must be running and initialized as
a NIM client or have rhosts permissions granted to the master. If set to no, the
server is configured to support the network boot. The actual boot would need to be
initiated later.
Instructor Guide
Instructor notes:
Purpose — Cover how to set up for installing an operating system on a machine.
Details —
Additional information — Note: In the CSM environment, boot_client is normally set to
no and the client is rebooted using the netboot script from the management server.
Transition statement — Having run a bos_inst operation on a machine object, NIM is
now prepared to respond to a network boot request from that machine. Network booting an
LPAR is something that was covered in previous courses, so we will not repeat that
discussion here (though the later lab exercises will have you practice this).
This unit was just a high level introduction to NIM. To properly use NIM, there is much more
you will need to understand. Let’s look at how you can build your skills beyond what has
been taught here.

V7.0.1
Instructor Guide
Uempty
More information about NIM

IBM Power Systems
Documentation
NIM from A to Z in AIX 5L
(http://www.redbooks.ibm.com/ )
AIX Version 7.1 Installation Guide and Reference
IBM Training class (AN22)

AIX Network Installation Manager (NIM)
(http://www.ibm.com/services/learning/index.html )
EZ NIM
nim_master_setup
nim_client_setup
Figure 4-20. More information about NIM AN152.2
Notes:
More information about NIM

NIM is a very powerful tool; it can be used in many different ways.
In this topic, we introduced some basic NIM concepts and terminology. If you plan to
make use of NIM in your cluster, we strongly recommend that you get more information
so that you can use NIM most effectively.
Documentation and Redbook

The following books provide in depth information about using NIM:
- AIX Version 7.1 Installation and migration
- SG24-7296 NIM from A to Z in AIX 5L (Redbook: http://www.redbooks.ibm.com/)
Instructor Guide
Classes
You should also consider the following class.
- AN220 - AIX Network Installation Management (NIM)
(IBM Learning Services training course:
http://www.ibm.com/services/learning/index.html)

V7.0.1
Instructor Guide

Purpose — Cover sources for additional NIM information.
Details —
Transition statement — One of the sources for additional NIM skills is the Network
Installation Management course. Let’s briefly look at some of the additional skills covered
in that course.
Instructor Guide
Additional topics in NIM course

IBM Power Systems
Push operations and unattended installations

lppsource and SPOT management issues
Problem determination
Customization scripts
Resource creation (lppsource, mksysb) options
Group definitions
Client software maintenance and bundles
Alternate disk migration
Security and networking issues
NIM based backup, recovery, and cloning
Figure 4-21. Additional topics in NIM course AN152.2
Notes:

V7.0.1
Instructor Guide

Purpose — Explain what additional skills are covered in the NIM course.
Details —
Transition statement — Let’s review what we have covered with a checkpoint question.
Instructor Guide
Checkpoint
IBM Power Systems
1. True or false: NIM can be used to fix an LPAR which fails to

boot because of a problem with the /etc/inittab.
Notes:

V7.0.1
Instructor Guide

Purpose —
Details —
IBM Power Systems

The answer is true, maint_boot.
Transition statement — Having explained how to do a basic installation and configuration
of NIM, let’s actually implement this in the lab exercise.
Instructor Guide
Exercise: Basic NIM configuration

IBM Power Systems
Configure an LPAR to be a NIM Master
Setup for a network installation of a client
Figure 4-23. Exercise: Basic NIM configuration AN152.2
Notes:

V7.0.1
Instructor Guide

Purpose — Direct the students to practice NIM in the lab exercise.
Details —
Instructor Guide
Unit summary
IBM Power Systems

Configure an AIX partition for use as a NIM master
Set up NIM to support the installation of AIX onto a client
Notes:

V7.0.1
Instructor Guide

Purpose — Review material through the use of checkpoint questions.
Details — Before continuing to the next unit stop and ask the students if there are any
additional questions before continuing.
Transition statement — Let’s move on to the next unit.
Instructor Guide

V7.0.1
Instructor Guide
Uempty Unit 5. System initialization: Accessing a boot

image
Estimated time
01:30

This unit describes the boot process up to the point of loading the boot
logical volume. It describes the content of the boot logical volume and
how it can be recreated, if it is corrupted.

• Describe the boot process through to the loading of the boot logical
volume
• Describe the contents of the boot logical volume
• Recreate the boot logical volume on a system which is failing to
boot
• Adjust the bootlist for the desired order of search

Accountability:
• Exercise
References
Online AIX Version 7.1 Operating system and device
management
© Copyright IBM Corp. 2009, 2012 Unit 5. System initialization: Accessing a boot image 5-1
Instructor Guide
Unit objectives
IBM Power Systems

Describe the boot process through to the loading the boot
logical volume
Describe the contents of the boot logical volume
Recreate the boot logical volume on a system which is failing
to boot
Adjust the bootlist for the desired order of search
Notes:

V7.0.1
Instructor Guide
Uempty Instructor Notes:

Details — Describe that boot errors are very frequent errors. Fixing these problems
requires a good knowledge of the boot process.
Transition statement — Let’s start with an overview about the AIX boot process.
Instructor Guide

V7.0.1
Instructor Guide
Uempty 5.1. System startup process

What students will do — The students will identify the boot process of loading the boot
logical volume. Additionally, students will be able to explain how bootlists are managed on
the different hardware architectures and how to create a new boot logical volume.
How students will do it — Through lecture and review questions.
What students will learn — Students will:
• Discover how an AIX system boots
• Identify how the boot logical volume is used during the boot process
• Identify how to manage bootlists
• Identify how to create a new boot logical volume
How this will help students on their job — By having a good understanding of the boot
process, solving any boot problem is much easier.
Instructor Guide
How does a Power server or LPAR boot?

IBM Power Systems
Possible failures
Check and initialize

Hardware error (only for
the hardware
POST physical server power-on)
Locate boot image using Unable to find any boot image

the boot list
Load and pass control to Boot image corrupted

boot image
Start AIX software

initialization
Figure 5-2. How does a Power server or LPAR boot? AN152.2
Notes:
Check and initialize hardware (POST)

After powering on a machine, the hardware is checked and initialized. This phase is
called the Power On Self Test (POST). The goal of the POST is to verify the
functionality of the hardware.
Locate and load the boot image

After the POST is complete, a boot image is located from the bootlist and is loaded into
memory. During a normal boot, the location of the boot image is usually a hard drive.
Besides hard drives, the boot image could be loaded from tape, CD-ROM, or the
network. This is the case when booting into maintenance mode. If working with the
Network Installation Manager (NIM), the boot image is loaded over the network.

V7.0.1
Instructor Guide
Uempty To use an alternate boot location you must invoke the appropriate bootlist by pressing
function keys during the boot process. There is more information on bootlists, later in
the unit.
Last steps
Passing control to the operating system means that the AIX kernel (which has just been
loaded from the boot image) takes over from the system firmware that was used to find
and load the boot image. The operating system is then responsible for completing the
boot sequence. The components of the boot image are discussed later in this unit.
All devices are configured during the boot process. This is performed in different
phases of the boot by the cfgmgr utility.
Towards the end of the boot sequence, the init process is started and processes the
/etc/inittab file.
Instructor Guide
Instructor notes:
Purpose — Introduce the AIX boot process. Keep this at the overview level.
Details —
Additional information — You might mention at this point that logical key switches are
used to determine which bootlist is used. If you press F5 or numeric 5, the system tries to
boot from a default bootlist that contains the diskette, CD-ROM, hard drive, and network. If
it boots from the hard drive, it will load AIX diagnostics rather than perform a normal boot.
Transition statement — Let’s show how the boot image is loaded from the boot logical
volume when booting from disk.

V7.0.1
Instructor Guide
Uempty
Loading of a boot image

IBM Power Systems
Firmware
Bo o
Boot ts
devices
(1) Diskette codetrap
(2) CD-Rom RAM
(3) Internal disk Boot Logical Volume
(4) Network (hd5)
hdisk0
Boot
controller
Figure 5-3. Loading of a boot image AN152.2
Notes:
Introduction
This visual shows how the boot logical volume is found during the AIX boot process.
Machines use one or more bootlists to identify a boot device. The bootlist is part of the
firmware.
Bootstrap code
System p and pSeries systems can manage several different operating systems. The
hardware is not bound to the software. The first block of the boot disk contains
bootstrap code that is loaded into RAM during the boot process. This part is sometimes
referred to as System Read Only Storage (ROS). The bootstrap code gets control. The
task of this code is to locate the boot logical volume on the disk, and load the boot
image. In some technical manuals, this second part is called the Software ROS. In the
case of AIX, the boot image is loaded.
Instructor Guide
Compression of boot image

To save disk space, the boot image is compressed on the disk. During the boot process
the boot image is uncompressed and the AIX kernel gets boot control.

V7.0.1
Instructor Guide

Purpose — Explain the loading of a boot image.
Details —
Additional information — Explain that many different error situations may come up. The
bootlist might be incorrect, the disk could be damaged, the boot record might be wrong and
more. LED codes should help to analyze the different kind of errors.
Transition statement — Let’s see what parts belong to the boot logical volume.
Instructor Guide
Contents of the boot logical volume (hd5)

IBM Power Systems
AIX Kernel RAMFS Reduced ODM
Figure 5-4. Contents of the boot logical volume (hd5) AN152.2
Notes:
Contents of the boot logical volume

The boot logical volume contains three main components: An AIX kernel, a mountable
RAM file system, and a reduced ODM.
AIX kernel
The AIX kernel is the core of the operating system and provides basic services like
process, memory, and device management. The AIX kernel is always loaded from the
boot logical volume. There is a copy of the AIX kernel in the hd4 file system (under the
name /unix), but this program has no role in system initialization. Never remove /unix,
because it is used for rebuilding the kernel in the boot logical volume.

V7.0.1
Instructor Guide
Uempty RAMFS
This RAMFS is a reduced or miniature root file system which is loaded into memory
and used as if it were a disk-based file system. The contents of the RAMFS are slightly
different depending on the type of system boot:
Type of boot Contents of RAM file system

Programs and data necessary to access rootvg and
Boot from system hard disk bring up the rest of AIX. When booted from in service
mode, it will boot a diagnostics facility.
Boot from the Installation Programs and data necessary to install AIX or
CD-ROM perform software maintenance.
Boot from Diagnostics Programs and data necessary to execute standalone
CD-ROM diagnostics.
Reduced ODM
The boot logical volume contains a reduced copy of the ODM. During the boot process,
many devices are configured before hd4 is available. For these devices, the
corresponding ODM files must be stored in the boot logical volume.
Instructor Guide
Instructor notes:
Purpose — Describe the components of the BLV.
Details — Introduce the different components as described in the student material.
Describe that the AIX kernel from the BLV is used during the boot process.
Additional information — Describe what the term reduced ODM means. Explain that
device support is available only for devices that are marked as base devices in PdDv.
The protofiles (in /usr/lib/boot and /usr/lib/boot/protoext) are used by the bosboot
command to determine which files should be put into the RAMFS image that is included in
the boot image.
Transition statement — Many system boot problems involve being unable to locate a
good boot image. In order to fix these problems, we often need to boot into special modes.
Let’s look at what determines which boot device is used.

V7.0.1
Instructor Guide
Uempty 5.2. Unable to find boot image

What students will do — Learn how to control the boot sequence and how to boot to SMS
mode.
How students will do it — Lecture, discussion, review questions, and lab exercises
What students will learn — How to boot to SMS mode and fix bootlist problems
How this will help students on their job — They will be better prepared to deal with
problems booting a system.
Instructor Guide
Working with bootlists

IBM Power Systems
Normal bootlist:
# bootlist -m normal hdisk0 hdisk1
# bootlist -m normal -o
hdisk0 blv=hd5
hdisk1 blv=hd5
Customized service bootlist (numeric 6 key):

# bootlist -m service -o
rmt0
hdisk0 blv=hd5
Default bootlist (numeric 5 key):

> Hard coded in firmware
cd0
hdisk0 blv=hd5
ent0
Figure 5-5. Working with bootlists AN152.2
Notes:
Introduction
You can use the command bootlist or diag from the command line to change or
display the bootlists. You can also use the System Management Services (SMS)
programs. SMS is covered on the next visual.
bootlist command
The bootlist command is the easiest way to change the bootlist. The first example
shows how to change the bootlist for a normal boot. In this example, we boot either from
hdisk0 or hdisk1. To query the bootlist, you can use the bootlist -o option.
The blv=hd5 part of the bootlist entry is to identify which boot logical volume to use on
that listed disk. This is related to the AIX multibos capability that is covered later in this
course.
The second example shows how to display the customizable service bootlist.

V7.0.1
Instructor Guide
Uempty The bootlist command also allows you to specify IP parameters to use when
specifying a network adapter. For example:
# bootlist -m service ent0 gateway=192.168.1.1 bserver=192.168.10.3
client=192.168.1.57
Using the service bootlist in this way can allow you to boot to maintenance or diagnostic
using a NIM server without having to use SMS to specify the network adapter as the
boot device.
Types of bootlists
The normal bootlist is used during a normal boot.
The default bootlist (hard coded in the firmware) is called when numeric 5 is pressed
during the boot sequence.
Most machines, in addition to the default bootlist and the customized normal bootlist,
allow for a customized service bootlist. This is set using mode service with the
bootlist command. The service bootlist is called when the numeric 6 key is pressed
during boot.
For machines which are partitioned into logical partitions, the HMC is used to boot the
partitions and it provides for specifying boot modes, thus eliminating the need to time
the pressing of special keys. Since pressing either numeric 5 or numeric 6 keys causes
a service mode boot and since a service mode boot using a boot logical volume will
result in booting to diagnostics, these options are referred to in the HMC as booting to
diagnostic either with the default bootlist or the stored (customizable) bootlist.
Here is a list summarizing the boot modes and the manual keys associated with them
(this can vary depending on the model of your machine):
• Numeric 1: Start an SMS (System Management Services) mode boot
• Numeric 5: Start a service mode boot using the default service bootlist
The default service bootlist is:
cd0
hdisk0 blv=hd5
ent0
• Numeric 6: Start a service mode boot using the customized service bootlist.
You can find variations on the different models of AIX systems. Refer to your specific
model at: http://publib.boulder.ibm.com/eserver. Look for your model under IBM
Systems Hardware.
Instructor Guide
Instructor notes:
Purpose — Describe how to work with the bootlists.
Details —
Additional information — The bootlist command will accept one more mode called
both. As you might suspect, the both mode sets the service and normal bootlist as the
same time to the same value.
Transition statement — Let’s continue with discussion of the bootlist command but
looking at an AIX 7 enhancement that helps us manage multi-path I/O situations.

V7.0.1
Instructor Guide
Uempty
AIX 7: Bootlist pathid enhancements

IBM Power Systems
The bootlist command now allows specification of the pathid of a

device when setting the bootpath:
# bootlist -m normal hdisk0 blv=hd5 pathid=0
The pathid argument may be repeated for multiple paths in the desired
order:
# bootlist -m normal hdisk0 blv=hd5 pathid=0 pathid=1
or
# bootlist -m normal hdisk0 blv=hd5 pathid=0,1
The bootlist command will now also display the pathid with the
device:
# bootlist -m normal –o
hdisk0 blv=hd5 pathid=0
hdisk0 blv=hd5 pathid=1
Figure 5-6. AIX 7: Bootlist pathid enhancements AN152.2
Notes:
The benefit to the user with regard to the pathid command modifications is the ability
to operate at a pathid level. This is very important with the bootlist command where
users used to have to selectively delete and reconfigure device paths to generate
bootlists on systems with MPIO disks. The operation can now be performed with a
single command.
There have also been situations where the bootlist was too long. When the bootlist
specifies disks without any pathid restriction, it adds all paths with each path taking an
entry in the bootlist. The bootlist has a limited capacity. This could result being unable to
use an alternate disk. Use of the pathid specification can avoid this type of problem.
It is important to remember that ordering of paths will be maintained with the bootlist
command. If a user wishes the bootlist to be set to boot from paths 1, 0, and 2
respectively, using the pathid=1,0,2 argument will perform this operation for them.
Instructor Guide
Instructor notes:
Purpose — Discuss AIX 7 bootlist pathid enhancements.
Details — Once upon a time, the only information about a disk in the bootlist was the name
of the disk.
Then with the advent of multibos, we could have two boot logical volumes in the same
rootvg, each at a different fix pack level. As a result, the bootlist needed to identify both the
disk name and the BLV that should be used.
We have also discovered that using disks which are defined in SAN attached storage
subsystems provides multiple paths to that boot disk. To address this situation, AIX can
now include the pathid as part of the information in the bootlist.
The examples which are shown assume that there is only one BLV on the specified boot
disk, which is indicated with blv=hd5.
The first example shows setting the bootlist to a single disk and restricting it to only one
path. This can be useful when you need to restrict the number of entries in the bootlist. By
default, if the system administrator only identifies the logical name of the disk in specifying
the bootlist, the bootlist automatically includes one entry for each pathid. There is a
maximum size to a bootlist and having all possible paths can fill up the bootlist fairly quickly.
The second example shows how the bootlist command can be used to specify multiple
paths to a given boot disk. Notice that there are two different syntax's, both valid. The first
syntax has the pathid=# specified multiple times. The second syntax shows that you can
specify the pathid attribute a single time with the assignment of a comma-delimited list of
pathids.
As explained in the student notes, the order in which the pathids are specified will
determine the order in which these paths are tried in accessing the boot image.
The last bullet illustrates how the bootlist display (-o for output) will list each unique
combination that is defined in the bootlist.
Transition statement — The SMS programs provide another method to set a bootlist.
Let’s take a look at how to start SMS.

V7.0.1
Instructor Guide
Uempty
Starting System Management Services

IBM Power Systems
During AIX partition activation

Press numeric 1 or specify SMS on HMC activate
IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM
1 = SMS Menu 5 = Default Boot List

8 = Open Firmware Prompt 6 = Stored Boot List
Memory Keyboard Network SCSI

...
Figure 5-7. Starting System Management Services AN152.2
Notes:
Booting to SMS
If you can not boot AIX because the bootlist needs correcting, then you will need to use
the System Management Services (SMS) to modify the bootlist. The SMS programs are
integrated into the hardware (they reside in NVRAM).
The visual shows how to start the System Management Services. There is an
equivalent graphic menu seen on older systems. During system boot, shortly before the
firmware looks for a boot image, it discovers some basic hardware on the system. At
this point the LED usually will display a value of E1F1. As the devices are discovered,
either a text name or graphic icon for the resource will display on the screen. The
second device discovered is usually the keyboard. When the keyboard is discovered, a
unique double beep tone is usually sounded. Having discovered the keyboard, the
system is ready to accept input that will override the default behavior of conducting a
normal boot. But once the last icon or name is displayed, the system starts to use the
bootlist to find the boot image and it is too late to change it. One of the keyboard actions
Instructor Guide
you can do during this brief period of time is to press the numeric 1 key to request that
the system boot using SMS firmware code.
SMS on LPAR systems

To start SMS using the Advanced Option for Power On under a POWER5, POWER6, or
POWER7 HMC:
Activate the partition using the SMS boot mode. Do this by clicking the Advanced
button when activating the partition. In the Boot Mode drop down list, select SMS.
Do not forget to choose to open a terminal window. The partition will stop at the SMS
menu.
To start the SMS profile under a POWER4 HMC:
From the Server and Partition: Server Management application, select the profile
for the partition and change the boot mode to SMS. Then, activate the partition
using this profile. Be sure to check the Open Terminal box when activating.

V7.0.1
Instructor Guide

Purpose — Explain how to start SMS.
Details —
Transition statement — How do you change the bootlist in SMS?
multimedia library of Elluminate in place of the next two visuals. You can then continue your
Elluminate.
play.
in place of the next two visuals.
next two visuals as well.
Instructor Guide
Working with bootlists in SMS (1 of 2)

IBM Power Systems
System Management Services

Main Menu Multiboot
1. Select Language
2. Setup Remote IPL 1. Select Install/Boot Device
(Initial Program Load) 2. Configure Boot Device Order
3. Change SCSI Settings 3. Multiboot Startup <OFF>
4. Select Console
5. Select Boot Options ===> 2
===> 5
Select Device Type

Configure Boot Device Order 1. Diskette
1. Select 1st Boot Device 2. Tape
2. Select 2nd Boot Device 3. CD/DVD
3. Select 3rd Boot Device 4. IDE
4. Select 4th Boot Device 5. Hard Drive
5. Select 5th Boot Device 6. Network
6. Display Current Setting 7. None
7. Restore Default Setting 8. List All Devices
===> 1 ===> 8
Figure 5-8. Working with bootlists in SMS (1 of 2) AN152.2
Notes:
Working with the bootlist

The System Management Service Main Menu lists:
1. Select Language
2. Setup Remote IPL (Initial Program Load)
3. Change SCSI Settings
4. Select Console
5. Select Boot Options
In the System Management Service menu, pick Select Boot Options to work with the
bootlist.
The next screen is the Multiboot menu which lists:
1. Select Install/Boot Device
2. Configure Boot Device Order
3. Multiboot Startup <OFF>

V7.0.1
Instructor Guide
Uempty It allows you to either specify a specific device to boot with right now, modify the
customized bootlists (with the intent of booting using one of them), or to request that
you be prompted at each boot for the device to boot from (multiboot option).
The focus here is the second option, used to modify the customized bootlist. The
Configure Bootlist Device Order panel lists:
1. Select 1st Boot Device
2. Select 2nd Boot Device
3. Select 3rd Boot Device
4. Select 4th Boot Device
5. Select 5th Boot Device
6. Display Current Setting
7. Restore Default Setting
It allows us to either list or modify the bootlist. You select which position in the bootlist
you wish to modify and then it lists possible device type to obtain a list of device to
select:
1. Diskette
2. Tape
3. CD/DVD
4. IDE
5. Hard Drive
6. Network
7. None
8. List All Devices
Select the device type. If you do not have many bootable devices it is sometimes easier
to use the List All Devices option.
Finally, you would select a specific device to place in that position of the bootlist, as
illustrated on the next visual.
It is important to understand that when SMS is used to modify the bootlist, both the
normal bootlist and the service bootlist are modified. If you wanted them to be different,
you will need to recustomize them, later, when you have a command prompt (such as in
multiuser mode).
Instructor Guide
Instructor notes:
Purpose — Show how to change the bootlist in SMS
Details — When you use SMS to change the bootlist, you are changing both the normal
and service customizable bootlists. After fixing the problem at hand, you may with to use
the bootlist command to recustomize them if you want them to be different.
Additional information — The following keys are used (follow with the HMC identifying
text):
- F1 or numeric 1: Start System Management Services
- F5 or numeric 5: Boot in diagnostic mode, use default bootlist
- F6 or numeric 6: Boot in diagnostic mode, use nondefault bootlist
The default bootlist is set to diskette, CD-ROM, internal disk and any communication
adapter.
To boot diagnostics from disk, do not insert a CD and request to use the default bootlist
(press the appropriate key (F5/numeric 5) or specify with HMC).
The other options:
Boot versus Multiboot
Under Select Boot Options, there is a multiboot mode item. This is a toggle that turns
multiboot mode either on or off. If you turn it on, the system will boot to an SMS menu every
time you boot the system in normal mode. This is to allow you to choose where to boot from
each time. For example, you might have different versions of AIX on different hard disks
and want to alternate boot between them. If an SMS menu is displayed when performing a
normal boot, this might be the reason.
Transition statement — Once we have selected the category of boot device, we need to
select the particular device we wish to use in the identified position in the bootlist. Let’s see
how we do this.

V7.0.1
Instructor Guide
Uempty
Working with bootlists in SMS (2 of 2)

IBM Power Systems
Select Device
Device Current Device
Number Position Name
1. - IBM 10/100/1000 Base-TX PCI-X Adapter
( loc=U789D.001.DQDWAYT-P1-C5-T1 )
2. - SAS 73407 MB Harddisk, part=2 (AIX 7.1.0)
( loc=U789D.001.DQDWAYT-P3-D1 )
3. 1 SATA CD-ROM
( loc=U789D.001.DQDWAYT-P1-T3-L8-L0
Select Task )
4. None SAS 73407 MB Harddisk, part=2 (AIX 7.1.0)
===> 2 ( loc=U789D.001.DQDWAYT-P3-D1 )
1. Information
2. Set Boot Sequence: Configure as 1st Boot Device
Current Boot Sequence
===> 2 1. SAS 73407 MB Harddisk, part=2 (AIX 7.1.0)
2. None
3. None
4. None
Figure 5-9. Working with bootlists in SMS (2 of 2) AN152.2
Notes:
Selecting bootlist devices

You will be presented with list of devices to select from.For example:
1. - IBM 10/100/1000 Base-TX PCI-X Adapter
( loc=U789D.001.DQDWAYT-P1-C5-T1 )
2. - SAS 73407 MB Harddisk, part=2 (AIX 7.1.0)
3. 1 SATA CD-ROM
( loc=U789D.001.DQDWAYT-P1-T3-L8-L0 )
4. None
For each position in the bootlist, you can select a device. The location code provided
with each device in the list allows you to uniquely identify devices that otherwise might
be confused.
Instructor Guide
Next you are presented with a Select Task panel which provide the following options:
1. Information
2. Set Boot Sequence: Configure as 1st Boot Device
Once you have selected a device, you need to set that selection.
You can repeat this for each position in the bootlist. The other option is to clear a device
by specifying none as an option for that position.
Exiting out of SMS will always trigger a boot attempt. If you have not specified a
particular device for this boot, it will use the bootlist you have set in SMS.

V7.0.1
Instructor Guide

Purpose — Complete the walkthrough of how to change a bootlist in SMS.
Details —
Transition statement — Let’s next discuss how to handle a corruption of the boot logical
volume.
Instructor Guide

V7.0.1
Instructor Guide
Uempty 5.3. Corrupted boot logical volume

What students will do — The students will learn to boot to maintenance mode, access the
rootvg, and repair a corrupted BLV.
How students will do it — Through lecture and lab exercise
• Learn how to boot a system in maintenance mode
• Learn how to select the correct disk to be accessed
• Learn how to rebuild the BLV
How this will help students on their job — They will learn how to fix situations where the
system will not boot due to a corrupted BLV.
Instructor Guide
Boot device alternatives (1 of 2)

IBM Power Systems
Boot device is either:

First one found with a boot image in the bootlist
Device specified in SMS Select Install/Boot Device
If the boot device is removable media (CD, DVD, Tape):
Boots to the Install and Maintenance menu
If the boot device is a network adapter:
Boot result depends on NIM configuration for client machine:
• nim –o bos_inst : Install and Maintenance menu
• nim –o maint_boot : Maintenance menu
• nim –o diag : Diagnostic menu
Figure 5-10. Boot device alternatives (1 of 2) AN152.2
Notes:
Boot alternatives
The device the system will boot off of is the first one it finds in the designated bootlist.
Whenever the effective boot device is bootable media, such as a mksysb tape/CD/DVD
or installation media, the system will boot to the Install and Maintenance menu.
If the booting device is a network adapter, the mode of boot depends on the
configuration of the NIM server which services the network boot request. If the NIM
server is configured to support an AIX installation or a mksysb recover, then the system
will boot to Install and Maintenance. If the NIM server is configured to serve out a
maintenance image, then the system boots to a Maintenance menu (a sub-menu of
Install and Maintenance). If the NIM server is configured to serve out a diagnostic
image, then we boot to a diagnostic mode.
There are other ways to boot to a diagnostic utility. If the booting device is a CD with a
diagnostic CD in the drive, we boot into that diagnostic utility. If a service mode boot is

V7.0.1
Instructor Guide
Uempty requested and the booting device is a hard drive with a boot logical volume, then the
system boots into the diagnostic utilities.
The system can be signaled which bootlist to use during the boot process. The default
is to use the normal bootlist and boot in a normal mode. This can be changed during a
window of opportunity between when the system discovers the keyboard and before it
commits to the default boot mode. The signal may be generated from the system
console (this may be an HMC provided virtual terminal) or from a service processor
attached workstation (such as an HMC) which can simulate a keyboard signal at the
right moment.
The keyboard signal that is used can vary from firmware to firmware, but the most
common is a numeric 5 to indicate that the firmware should use the service bootlist and
a numeric 6 to indicate that the firmware should use the customizable service bootlist.
Either of these special keyboard signals will result in a service mode boot, which as we
stated can cause a boot to diagnostic mode when booting off a boot logical volume on
your hard drive.
With an HMC, you can specify which signal to send as part of the LPAR activation. Even
if you forget to override the default boot mode (usually normal to multiuser), you can still
use the virtual console keyboard as described to override, once the keyboard has been
discovered.
Instructor Guide
Instructor notes:
Purpose — Explain how the boot mode is controlled.
Details —
Transition statement — Let’s continue to look at the factors that affect boot behavior.

V7.0.1
Instructor Guide
Uempty
Boot device alternatives (2 of 2)

IBM Power Systems
If the boot device is a disk:

Boot depends on use of service mode:
Normal mode boot Boot to multiuser
Service mode boot Diagnostic menu
Two types of service mode boots:
Requesting default service bootlist (key 5)
Requesting customized service bootlist (key 6)
HMC advanced boot options support all of the above
Normal boot
Diagnostic with default bootlist
Diagnostic with stored bootlist
Figure 5-11. Boot device alternatives (2 of 2) AN152.2
Notes:
Booting off a disk with a boot logical volume (BLV)

When the boot device is a disk on your system, the disk must have a valid boot logical
volume to be successful. The result of the boot depends upon the mode of the boot. If
booting in normal mode, the system is booted up into multiuser mode (the default run
level of the inittab). If executing a service mode boot (using either default bootlist or the
customizable service mode bootlist), then the system will execute a diagnostics
program and present a diagnostics menu.
Note that when using the HMC advanced activation options, you can set the mode of
your boot and, if service mode, which boot list to use: default or stored (customized
service).
Instructor Guide
Instructor notes:
Purpose — Continue covering the factors that affect boot behavior.
Details —
Transition statement — Let’s use what we have just learned to effect a boot to
maintenance mode.

V7.0.1
Instructor Guide
Uempty
Accessing a system that will not boot

IBM Power Systems
HMC
Boot the system from
Advance Activate options: the BOS CD-ROM, tape
Default bootlist or
network device (NIM)
Select maintenance mode

Maintenance
1. Access a Root Volume
Group
2. Copy a System Dump to
Media Perform corrective actions
3. Access Advanced
Maintenance Recover data
4. Install from a System
Backup
Figure 5-12. Accessing a system that will not boot AN152.2
Notes:
Introduction
The visual shows an overview of how we access a system that will not boot normally.
The maintenance mode can be started from an AIX CD, an AIX bootable tape (like a
mksysb), or a network device that has been prepared to access a NIM master. The
devices that contain the boot media must be stored in the bootlists.
Boot into maintenance mode

To boot into maintenance mode:
- AIX 5L V5.3, AIX 6.1 and AIX 7.1 systems support the bootlist command and
booting from a mksysb tape, but the tape device is, by default, not part of the boot
sequence.
- If planning to boot off media in an LPAR environment, check that the device adapter
slot is allocated to the LPAR in question. If not, you may need to update the partition
Instructor Guide
profile to allocate that device. If the device is currently allocated to another LPAR,
then you will need to first deallocate it from that other LPAR.Use a dynamic LPAR
operation on the HMC to allocate that slot.
- If using the default bootlist, the sequence is fixed and the CD drive is the first
practical device.
- If using a tape drive or a network adapter as your boot device and not selecting a
boot device through SMS for this particular boot, then you will need to use one of the
customizable bootlists, usually the service bootlist.
Verify your bootlist, but do not forget that some machines do not have a service
bootlist. Check that your boot device is part of the bootlist:
# bootlist -m service -o
- If you want to boot from your internal tape device, you need to change the bootlist
because the tape device by default is not part of the bootlist. For example:
# bootlist -m service rmt0 hdisk0
- Whichever bootlist you are using, insert the boot media (either tape or CD) into the
drive.
- Power on the system (or activate the LPAR). The system begins booting from the
installation media. After several minutes, c31 is displayed in the LED/LCD panel (or
as the reference code on the HMC display) which means that the software is
prompting on the console for input (normally to select the console device and then
select the language). For an LPAR, your will need to have the virtual console started
to interact with the prompts.
- Normally, you are prompted to select the console device and then select the
language. After making these selections, you see the Installation and
Maintenance menu.
For partitioned systems with an HMC, you would normally use the HMC to access SMS
and then select the bootable device, which would bypass the use of a bootlist.
You can also use a NIM server to boot to maintenance. For this, you would need to
place your system’s network adapter in your customized service bootlist before any
other bootable devices, or use SMS to specifically request boot over that adapter (the
latter option is most common). Here is an example of setting the service boot list:
# bootlist -m service ent0 gateway=192.168.1.1
bserver=192.168.10.3 client=192.168.1.57
You would also need to set up the NIM server to provide a boot image for doing a
maintenance boot. For example, at the NIM server:
# nim -o maint_boot -spot <spotname> <client machine object
name>

V7.0.1
Instructor Guide
Uempty Use the correct installation media or SPOT

Be careful to use the correct AIX installation CD (or NIM spot, or mksysb tape) to
boot your machine. For example, you should not boot an AIX 5L V5.3-00 installed
machine with an AIX 5L V5300-03 installation CD. You must match the version,
release, and maintenance level. The same applies to the NIM spot level when using
a network boot with NIM as the server of the boot image. A common error you may
experience, if there is a mismatch, is an infinite loop of /etc/getrootfs errors
when trying to access the rootvg in maintenance mode.
Instructor Guide
Instructor notes:
Purpose — Identify how to access a system that does not boot.
Details — Emphasize that what causes us to boot into the Installation and Maintenance
menu is the fact that we booted off of installation media. It does not matter if we boot in
normal mode (using the normal bootlist) or service mode (using the default or customizable
service bootlists). It is only important that we find bootable installation media (tape, CD, or
network server) in the bootlist before anything else (such as a BLV or a Diagnostic CD).
With some SMS facilities, we can specify a particular device to use and bypass the
bootlists entirely.
Transition statement — Let’s show the maintenance mode menus that are available.
- a. To access the multimedia library click on the CD button along the toolbar in
Elluminate.
play.
next visual as well.

V7.0.1
Instructor Guide
Uempty
Booting in maintenance mode

IBM Power Systems
Welcome to Base Operating System

Define the System Installation and Maintenance
Console
Type the number of your choice and press Enter.
Choice is indicated by >>>.
1. Start Install Now with Default Settings

2. Change/Show Installation Settings and Install
3. >>>Start Maintenance Mode for System Recovery
4. Configure Network Disks (iSCSI)
>>> Choice [1]: 3
Maintenance
1. >>> Access a Root Volume Group

2. Copy a System Dump to Removable Media
3. Access Advanced Maintenance Functions
4. Erase Disks
5. Configure Network Disks (iSCSI)
6. Install from a System Backup
Choice [1]: 1
Figure 5-13. Booting in maintenance mode AN152.2
Notes:
First steps
When booting in maintenance mode, you first have to identify the system console that
will be used, for example your virtual console (vty), graphic console (lft), or serial
attached console (tty that is attached to the S1 port).
After selecting the console, the Installation and Maintenance menu is shown:
1 Start Install Now with Default Settings
2 Change/Show Installation Settings and Install
3 Start Maintenance Mode for System Recovery
4 Configure Network Disks (iSCSI)
Instructor Guide
As we want to work in maintenance mode, we use selection 3 to start up the

Maintenance menu:
1 Access a Root Volume Group
2 Copy a System Dump to Removable Media
3 Access Advanced Maintenance Functions
4 Erase Disks
5 Configure Network Disks (iSCSI)
6 Install from a System Backup
In a network boot using NIM, the console goes straight to the maintenance menu.
From this point, we access our rootvg to execute any system recovery steps that may
be necessary.

V7.0.1
Instructor Guide

Purpose — Explain the first maintenance menus that are shown.
Details — Describe how to start up the maintenance mode.
Additional information — You could, optionally, provide a brief explanation of what other
steps could be executed in the Maintenance menu. Copy a dump to a removable media
like a tape, accessing an advanced maintenance shell where no rootvg is available,
restoring a mksysb tape.
Transition statement — Let’s describe how to access the rootvg.
Instructor Guide
Working in maintenance mode

IBM Power Systems
Access a Root Volume Group

Type the number for a volume group to display the logical volume
information
and press Enter.
1) Volume Group 00c35ba000004c00000001153ce1c4b0 contains these

disks:
hdisk1 70006 02-08-00 hdisk0 70006 02-08-00
Choice: 1
Volume Group Information
Volume Group ID 00c35ba000004c00000001153ce1c4b0 includes the following

logical volumes:
hd5 hd6 hd8 hd4 hd2 hd9var
hd3 hd1 hd10opt

1) Access this Volume Group and start a shell
2) Access this Volume Group and start a shell before mounting filesystems
99) Previous Menu
Choice [99]: 1
Figure 5-14. Working in maintenance mode AN152.2
Notes:
Select the correct volume group

When accessing the rootvg in maintenance mode, you need to select the volume
group that is the rootvg. The Access a Root Volume Group panel will display all
detected volume groups and the disks that comprise these volume groups. Note that
only the volume group IDs are shown and not the names of the volume groups. Check
with your system documentation that you select the correct disk. Do not rely too much
on the physical volume name but instead rely more on the PVID, VGID, or SCSI ID.
After selecting the volume group, it will show the list of logical volumes contained in the
volume group. This is how you confirm you have selected rootvg. Two selections are
then offered:
- Access this Volume Group and start a shell
- Access this Volume Group and start a shell before mounting file systems

V7.0.1
Instructor Guide
Uempty Access this volume group and start a shell

When you choose this selection the rootvg will be activated (with the varyonvg
command), and all file systems belonging to the rootvg will be mounted. A shell will be
started which can be used to execute any system recovery steps.
Typical scenarios where this selection must be chosen are:
- Changing a forgotten root password
- Recreating the boot logical volume
- Changing a corrupted bootlist
Access this volume group and start a shell before mounting file systems
When you choose this selection, the rootvg will be activated, but the file system
belonging to the rootvg will not be mounted.
A typical scenario where this selection is chosen is when a corrupted file system needs
to be repaired by the fsck command. Repairing a corrupted file system is only possible
if the file system is not mounted.
Another scenario might be a corrupted hd8 transaction log. Any changes that take place
in the superblock or i-nodes are stored in the log logical volume. When these changes
are written to disk, the corresponding transaction logs are removed from the log logical
volume.
A corrupted transaction log must be reinitialized by the logform command, which is
only possible, when no file system is mounted. After initializing the log device, you need
to do a file system repair for all file systems that use this transaction log. Beginning with
AIX 5L V5.1, you have to explicitly specify the file system type: JFS or JFS2:
# logform -V jfs2 /dev/hd8
# fsck -y -V jfs2 /dev/hd1
# fsck -y -V jfs2 /dev/hd9var
# fsck -y -V jfs2 /dev/hd10opt
# exit
Keep in mind that US keyboard layout is used but you can use the retrieve function by
using set -o emacs or set -o vi.
Instructor Guide
Instructor notes:
Purpose — Describe how to access the rootvg.
Details —
Additional information — Describe that the logform command can result in data loss.
Transition statement — Let’s check where to find information about boot errors.

V7.0.1
Instructor Guide
Uempty
How to fix a corrupted BLV

IBM Power Systems
1 Boot to maintenance 2 Select volume group

mode from bootable that contains hd5
media:
CD, tape, or NIM
Maintenance
1 Access a Root Volume Group
3 Rebuild #
#
bosboot
sync
-ad /dev/hdisk0
BLV
# sync
# reboot
Figure 5-15. How to fix a corrupted BLV AN152.2
Notes:
Maintenance mode
If the boot logical volume is corrupted (for example, bad blocks on a disk might cause a
corrupted BLV), the machine will not boot.
To fix this situation, you must boot your machine in maintenance mode, from a CD or
tape. If NIM has been set up for a machine, you can also boot the machine from a NIM
master in maintenance mode. NIM is actual a common way to do special boots in a
logical partition environment.
Recreating the boot logical volume

After booting from CD, tape or NIM an Installation and Maintenance Menu is shown
and you can startup the maintenance mode. We will cover this later in this unit. After
Instructor Guide
accessing the rootvg, you can repair the boot logical volume with the bosboot
command. You need to specify the corresponding disk device, for example hdisk0:
# bosboot -ad /dev/hdisk0
# sync
# sync
# reboot
The sync commands will flush any file data in memory cache to disk. While you would
normal use a shutdown command, in maintenance mode it is appropriate to use the
reboot command.
The bosboot command requires that the boot logical volume (hd5) exists. If you ever
need to recreate the BLV from scratch, maybe it had been deleted by mistake or the
LVCB of hd5 has been damaged, the following steps should be followed:
1. Boot your machine in maintenance mode (from CD or tape (numeric 5) or use
(numeric 1) to access the Systems Management Services (SMS) to select boot
device).
2. Remove the old hd5 logical volume.
# rmlv hd5
3. Clear the boot record at the beginning of the disk.
# chpv -c hdisk0
4. Create a new hd5 logical volume: one physical partition in size, must be in
rootvg and outer edge as intrapolicy. Specify boot as logical volume type.
# mklv -y hd5 -t boot -a e rootvg 1
5. Run the bosboot command as described on the visual.
6. Check the actual bootlist.
# bootlist -m normal -o
7. Write data immediately to disk.
# sync
# sync
8. Reboot the system.
# reboot
By using the internal command ipl_varyon -i, you can check the state of the boot
record.

V7.0.1
Instructor Guide

Purpose — Describe the bosboot command.
Details — Describe the steps that are necessary to recreate the boot logical volume. Tell
the students that working in maintenance mode is explained later in this unit.
Describe that an hd5 boot logical volume must exist on the system.
Additional information — Be careful to use the correct AIX installation CD to boot your
machine. Consider installing AIX base media and then applying patches to the OS. The
patches make changes to both kernel routines AND libc. This invalidates using the
installation CDs to boot the system into maintenance mode and accessing the disks. This is
because when we boot, we use the /unix and libraries from the CD. Since they all match,
this should not be an issue. As we activate the rootvg, the root (/) file system from the CD
is overlaid with the root (/) file system from the disks. Now, any reference to /unix are
resolved to the DISK! If this /unix does not match what we actually booted from on the CD,
bad things will happen. The same applies for libraries being referenced.
Transition statement — Time for checkpoint questions.
Instructor Guide
Checkpoint (1 of 2)
IBM Power Systems
1. True or false: You must have AIX loaded on your system to use
the System Management Services programs.
2. Your AIX system is currently powered off. AIX is installed on

hdisk1 but the bootlist is set to boot from hdisk0. How can you fix
the problem and make the machine boot from hdisk1?
3. Your machine is booted and is at the # prompt.

a. What is the command that will display the normal bootlist?
b. How could you change the normal bootlist?
Notes:

V7.0.1
Instructor Guide

Purpose — Review and test the students, understanding of this first part of the unit.
Details — A suggested approach is to give the students about five minutes to answer the
questions on this page. Then, go over the questions and answers with the class.
IBM Power Systems
The answer is false: SMS is part of the built-in firmware.

The answer is you need to boot the SMS programs and set the new
boot list to include hdisk1.

The answer is # bootlist -m normal –o.
The answer is # bootlist -m normal device1 device2.
Transition statement — Let’s continue with more Checkpoint questions.
Instructor Guide
Checkpoint (2 of 2)
IBM Power Systems
4. What command is used to build a new boot image and write it to the
boot logical volume?
5. What script controls the boot sequence?
6. True or false: During the AIX boot process, the AIX kernel is loaded
from the root file system.
7. How do you boot an AIX machine into maintenance mode?
Notes:

V7.0.1
Instructor Guide

Purpose — Review and test the students understanding of this unit.
IBM Power Systems
The answer is bosboot -ad /dev/hdiskx.

The answer is rc.boot.
The answer is false: the AIX kernel is loaded from hd5.

The answer is you need to boot from an AIX CD, mksysb, or NIM
server.
Transition statement — Now, let’s do an exercise.
Instructor Guide
Exercise: System initialization: Accessing a boot

image
IBM Power Systems
Identify information on your system

Prepare NIM to support booting to
maintenance mode
Boot to maintenance mode
Repair a corrupted boot logical volume
Work with multi-path bootlists
Figure 5-18. Exercise: System initialization: Accessing a boot image AN152.2
Notes:

V7.0.1
Instructor Guide

Details —
Transition statement — Let’s summarize.
Instructor Guide
Unit summary
IBM Power Systems

Describe the boot process through to the loading the boot
logical volume
Describe the contents of the boot logical volume
Recreate the boot logical volume on a system which is failing
to boot
Adjust the bootlist for the desired order of search
Notes:

V7.0.1
Instructor Guide

Instructor Guide

V7.0.1
Instructor Guide
Uempty Unit 6. System initialization: rc.boot and inittab
Estimated time
01:25

This unit describes the final stages of the boot process and outlines
how devices are configured for the system.
Common boot errors are described and how they can be analyzed to
fix boot problems.

• Identify the steps in system initialization from loading the boot
image to boot completion
• Identify how devices are configured during the boot process
• Analyze and solve boot problems

Accountability:
• Lab exercise
References
management
© Copyright IBM Corp. 2009, 2012 Unit 6. System initialization: rc.boot and inittab 6-1
Instructor Guide
Unit objectives
IBM Power Systems

Identify the steps in system initialization from loading the boot
Identify how devices are configured during the boot process
Analyze and solve boot problems
Notes:
There are many reasons for boot failures. The hardware might be damaged or, due to
user errors, the operating system might not be able to complete the boot process.
A good knowledge of the AIX boot process is a prerequisite for all AIX system
administrators.

V7.0.1
Instructor Guide

Details — Explain that boot errors are very frequent errors. Describe that fixing these
problems requires a good knowledge of the boot process.
Transition statement — Let's start with an overview of the AIX boot process.
multimedia library of Elluminate in place of visuals 6-2 to 6-7. You can then continue your
lecture normally to reinforce the topics.
Elluminate.
play.
in place of visuals 6-2 to 6-7.
Note that you can also use this activity as a review for the information covered in
visuals 6-2 to 6-7 as well.
Instructor Guide

V7.0.1
Instructor Guide
Uempty 6.1. AIX initialization part 1

What students will do — The students will identify the boot process after the AIX kernel
has been loaded from the boot logical volume. Additionally, students will be able to explain
how devices are configured by the cfgmgr.
How students will do it — Through discussion, lecture, and checkpoint questions.
• Detect how AIX boots after loading the AIX kernel
• Identify the role of the rc.boot script
• Identify how the ODMs in hd4 and hd5 are synchronized
• Identify how the cfgmgr is used to configure devices
How this will help students on their job — Many boot errors occur during the AIX boot
process. By having a good knowledge of this process, fixing any boot problem is much
easier.
Instructor Guide
System software initialization overview

IBM Power Systems
Load kernel and pass control
/
Restore RAM file system from
boot image etc dev mnt usr
Start init process Configure base

rc.boot 1 devices
(from RAMFS)
rc.boot 2
Activate rootvg
Configure remaining
Start "real" init process rc.boot 3
devices
(from rootvg)
/etc/inittab
Figure 6-2. System software initialization overview AN152.2
Notes:
Boot sequence
The visual shows the boot sequence after loading the AIX kernel from the boot image.
The AIX kernel gets control and executes the following steps:
1. The kernel restores a RAM file system into memory by using information
provided in the boot image. At this stage the rootvg is not available, so the
kernel needs to work with commands provided in the RAM file system. You can
consider this RAM file system as a small AIX operating system.
2. The kernel starts the init process which was provided in the RAM file system
(not from the root file system). This init process executes a boot script
rc.boot.
3. rc.boot controls the boot process. In the first phase (it is called by init with
rc.boot 1), the base devices are configured. In the second phase (rc.boot 2),
the rootvg is activated (or varied on).

V7.0.1
Instructor Guide
Uempty 4. After activating the rootvg at the end of rc.boot 2, the kernel overmounts the
RAM file system with the file systems from rootvg. The init from the boot
image is replaced by the init from the root file system, hd4.
5. This init processes the /etc/inittab file. Out of this file, rc.boot is called a third
time (rc.boot 3) and all remaining devices are configured.
Instructor Guide
Instructor notes:
Purpose — Introduce the AIX software boot process. Keep this on the overview level.
Details — Explain as described in the student notes.
Additional information — Underline that at the beginning of the boot process, no rootvg
is available. Before activating the rootvg, all devices that are needed to varyon the rootvg
must be configured.
Transition statement — Let’s look what rc.boot is doing.

V7.0.1
Instructor Guide
Uempty
rc.boot 1
IBM Power Systems
Failure LED
Process 1 rootvg is not active
F05 init
c06
rc.boot 1
Boot image
ODM
restbase
548 510
RAM file system
s ODM
cfgmgr -f i g_Rule
f
Con se=
1
pha
bootinfo -b Devices to activate rootvg

511 are configured !
Figure 6-3. rc.boot 1 . AN152.2
Notes:
rc.boot phase 1 actions

The init process started from the RAM file system, executes the boot script
rc.boot 1. If init fails for some reason (for example, a bad boot logical volume), c06
is shown on the LED display. The most common defect in the boot logical volume is
missing device drivers; this is solved by rebuilding the boot image with the needed drive
drivers included.
The following steps are executed when rc.boot 1 is called:
1. The restbase command is called which copies the ODM from the boot image
into the RAM file system. After this step, an ODM is available in the RAM file
system. The LED shows 510 (DEV CFG 1 START) if restbase completes
successfully, otherwise LED 548 (RESTBASE FAILED) is shown.
2. When restbase has completed successfully, the configuration manager, cfgmgr,
is run with the option -f (first). cfgmgr reads the Config_Rules object class and
Instructor Guide
executes all methods that are stored under phase=1. Phase 1 configuration
methods result in the configuration of base devices into the system, so that the
rootvg can be activated in the next rc.boot phase.
3. Base devices are all devices that are necessary to access the rootvg. If the
rootvg is stored on a hdisk0, all devices from the motherboard to the disk itself
must be configured in order to be able to access the rootvg.
4. At the end of rc.boot 1, the system determines the last boot device (used to
establish the /dev/ipldevice link) by calling bootinfo -b. The LED shows 511
(DEV CFG 1 END), followed by 553 (PHASE 1 COMPLETE).

V7.0.1
Instructor Guide

Purpose — Explain rc.boot 1.
Details — When init starts, F05 will be shown on the PCI machines.
Note: The LED values shown in the visual are sample LED values and may be different
with different architectures. Remind students that some LED values tend to be different on
different machines. All values are not shown here.
Additional information — Underline the following important information:
1. When rc.boot 1 executes, there is no access to rootvg.
2. When rc.boot 1 is finished, all devices are configured to activate the rootvg in
rc.boot 2.
Transition statement — Let’s see what happens in rc.boot 2.
Instructor Guide
rc.boot 2 (1 of 2)
IBM Power Systems
Failure LED rc.boot 2
551
552 554 517 rootvg

ipl_varyon
hd4: hd2: hd9var:
hd6
555 / /usr /var
fsck -f /dev/hd4
mount /dev/hd4 / copycore:
557
if dump,
fsck -f /dev/hd2 copy
518 517 mount /usr dev etc mnt usr var
fsck -f /dev/hd9var
517 mount /var /
518 copycore RAM File system
umount /var
556
swapon /dev/hd6
Figure 6-4. rc.boot 2 (1 of 2) AN152.2
Notes:
rc.boot phase 2 actions (part 1)

rc.boot is run for the second time and is passed the parameter 2. The LED shows 551
(VARYON_IPLDEV). The following steps take part in this boot phase:
1. The rootvg is varied on with a special version of the varyonvg command
designed to handle rootvg. If ipl_varyon completes successfully, 517 (MOUNT
ROOT) is shown on the LED, otherwise one of the following is shown and the boot
process stops:
- 552 (IPLVARYON ERROR)
- 554 (UNKNOWN BOOT DISK)
- 556 (LVM_QUERY ERROR)
2. The root file system, hd4, is checked by fsck. The option -f means that the file
system is checked only if it was not unmounted cleanly during the last shutdown.

V7.0.1
Instructor Guide
Uempty This improves the boot performance. If the check fails, LED 555 (FSCK ERROR) is
shown.
3. Afterwards, /dev/hd4 is mounted directly onto the root (/) in the RAM file system.
If the mount fails, for example due to a corrupted JFS log, the LED 557 (ROOT
MNT FAILED) is shown and the boot process stops.
4. Next, /dev/hd2 is checked and mounted (again with option -f, it is checked only
if the file system wasn't unmounted cleanly). If the mount fails, LED 518 (/USR
MOUNT FAILED) is displayed and the boot stops.
5. Then, the /var file system is checked and mounted. This is necessary at this
stage, because the copycore command checks if a dump occurred. If a dump
exists in a paging space device, it will be copied from the dump device,
/dev/hd6, to the copy directory which is by default the directory /var/adm/ras.
/var is unmounted afterwards. If the /var mount fails, LED 518 (/VAR MOUNT
FAILED) is displayed and the boot stops.
6. The primary paging space /dev/hd6 is made available.
Special root syntax in RAMFS

Once the disk-based root file system is mounted over the RAMFS, a special syntax is
used in rc.boot to access the RAMFS files:
• RAMFS files are accessed using a prefix of /../ . For example, to access the fsck
command in the RAMFS (before the /usr file system is mounted), rc.boot uses
/../usr/sbin/fsck.
• Disk-based files are accessed using normal AIX file syntax. For example, to
access the fsck command on the disk (after the /usr file system is mounted)
rc.boot uses /usr/sbin/fsck.
Note: This syntax only works during the boot process. If you boot from the CD-ROM
into maintenance mode and need to mount the root file system by hand, you will need
to mount it over another directory, such as /mnt, or you will be unable to access the
RAMFS files.
Instructor Guide
Instructor notes:
Purpose — Describe the first part of rc.boot 2.
Details — Introduce this boot phase as described in the student material. There are two
categories of status codes: SHOWLED codes and loopled codes.
The SHOWLED code are inside the graphic boxes and denote the progress of the script.
Seeing these code indicates that the specified operation is hung.
The loopled codes are shown outside of the graphic boxes and denote specific failures that
will stop the boot process.
The text in capitals and in parenthesis in the student notes is part of the LED message that
you would see on the server physical LED display, if the server was in the manufacturing
default configuration with only a single operating systems. With a logically partitioned
system, the HMC will only show the numeric code.
Additional information — Beginning with AIX 5L V5.1, the rootvg file system is mounted
directly over the root directory in the RAMFS. This simplifies several steps during phase 2
and eliminates the need to remount the rootvg file systems at the end of phase 2.
In many reference documents, LED 518 is defined as indicating that the /usr file system
could not mount using the network. This is incorrect. LED 518 will display anytime /usr
cannot be mounted.
Transition statement — Let’s describe the second part of rc.boot 2.

V7.0.1
Instructor Guide
Uempty
rc.boot 2 (2 of 2)
IBM Power Systems
swapon /dev/hd6 rootvg
hd4: hd2: hd9var:

hd6
Copy RAM /dev files to disk: / /usr /var
mergedev
Copy RAM ODM files to disk:

cp /../etc/objrepos/Cu* dev etc
/etc/objrepos ODM
mount /var
dev etc mnt usr var
ODM
Copy boot messages to
alog /
RAM file system
Kernel removes RAMFS
Notes:

After the paging space /dev/hd6 has been made available, the following tasks are
executed in rc.boot 2:
1. To understand this step, remember two things:
- /dev/hd4 is mounted onto root(/) in the RAM file system.
- In rc.boot 1, the cfgmgr has been called and all base devices are
configured. This configuration data has been written into the ODM of the
RAM file system.
Now, mergedev is called and all /dev files from the RAM file system are copied to
disk.
2. All customized ODM files from the RAM file system ODM are copied to disk as
well. At this stage, both ODMs (in hd5 and hd4) are in sync.
Instructor Guide
3. The /var file system (hd9var) is mounted.

4. All messages during the boot process are copied into a special file. You must use
the alog command to view this file:
# alog -t boot -o
As no console is available at this stage all boot information is collected in this file.
When rc.boot 2 is finished, the /, /usr, and /var file systems in rootvg are active.
Final stage
At this stage, the AIX kernel removes the RAM file system (returns the memory to the
free memory pool) and starts the init process from the root (/) file system in rootvg.

V7.0.1
Instructor Guide

Purpose — Describe the second part of rc.boot 2.
Details —
Transition statement — Let’s describe rc.boot 3.
Instructor Guide
rc.boot 3 (1 of 2)
IBM Power Systems
Process 1 /etc/inittab:
init /sbin/rc.boot 3 553
fsck -f /dev/hd3
Here, we work with mount /tmp 517 518
rootvg
syncvg rootvg &

517
Normal: cfgmgr -p2 Config_Rules /etc/objrepos:

Service: cfgmgr -p3 ODM
phase=2
phase=3
c31 cfgcon c32
rc.dt boot
c33 c34
savebase hd5:
ODM
Notes:

If rc.boot phase 2 completes as indicated by LED 553 (BOOT PHASE 1 COMPLETE), you
can assume that rc.boot phase 3 has begun. At this boot stage, the /etc/init
process is started. It reads the /etc/inittab file and executes the commands line-by-line.
It runs rc.boot for the third time, passing the argument 3 that indicates the last boot
phase.
rc.boot 3 executes the following tasks:
1. The /tmp file system is checked and mounted.
2. The rootvg is synchronized by syncvg rootvg. If rootvg contains any stale
partitions (for example, a disk that is part of rootvg was not active), these
partitions are updated and synchronized. syncvg is started as a background job.

V7.0.1
Instructor Guide
Uempty 3. The configuration manager is called again. If the key switch or boot mode is
normal, the cfgmgr is called with option -p2 (phase 2). If the key switch or boot
mode is service, the cfgmgr is called with option -p3 (phase 3).
4. The configuration manager reads the ODM class Config_Rules and executes
either all methods for phase=2 or phase=3. All remaining devices that are not
base devices are configured in this step.
5. The console will be configured by cfgcon. The numbers c31, c32, c33 or c34
are displayed depending on the type of console:
- c31: Console not yet configured. Provides instruction to select a console.
- c32: Console is a lft (graphic display) terminal.
- c33: Console is a tty.
- c34: Console is a file on the disk.
If CDE is specified in /etc/inittab, the CDE will be started and you get a graphical
boot on the console.
6. To synchronize the ODM in the boot logical volume with the ODM from the root
(/) file system, savebase is called.
Instructor Guide
Instructor notes:
Purpose — Describe the first part of rc.boot 3.
Details — Describe as explained in the student notes.
Additional information — The savebase command is necessary to synchronize the
ODMs from hd4 (rootvg ODM repository) and hd5 (reduced ODM in the BLV).
Transition statement — Let’s describe the second part of rc.boot 3.

V7.0.1
Instructor Guide
Uempty
rc.boot 3 (2 of 2)
IBM Power Systems
/etc/objrepos:
savebase ODM
syncd 60
errdemon
hd5:
Turn off LEDs ODM
rm /etc/nologin
A device that was previously detected
s
Ye could not be found. Run "diag -a".
chgstatus=3
in CuDv ? System initialization is completed.
Execute next line in

/etc/inittab
Notes:

After the ODMs have been synchronized again, the following steps take place:
1. The syncd daemon is started. All data that is written to disk is first stored in a
cache in memory before writing it to the disk. The syncd daemon writes the data
from the cache each 60 seconds to the disk.
Another daemon process, the errdemon daemon, is started. This process allows
errors triggered by applications or the kernel to be written to the error log.
2. The LED display is turned off.
3. If the file /etc/nologin exists, it will be removed. If a system administrator creates
this file, a login to the AIX machine is not possible. During the boot process
/etc/nologin will be removed.
Instructor Guide
4. If devices exist that are flagged as missing in CuDv (chgstatus=3), a message

is displayed on the console. For example, this could happen if external devices
are not powered on during system boot.
5. The last message, System initialization completed, is written to the
console. rc.boot 3 is finished. The init process executes the next command in
/etc/inittab.

V7.0.1
Instructor Guide

Purpose — Describe the second part of rc.boot 3.
Additional information — The /etc/nologin file is used to prevent logging in to a system.
Just the existence of the file is needed. If any text is placed in the file, this information will
be displayed when user attempts to log in.
Transition statement — Let’s summarize the rc.boot script.
Instructor Guide
rc.boot summary
IBM Power Systems
Executed Phase
Command Primary Actions
From Config_Rules
RAM restbase
rc.boot 1 file system 1
cfgmgr -f
(/dev/ram0)
ipl_varyon
RAM Mount /, /usr, /var file systems
rc.boot 2 file system
mergedev
(/dev/ram0)
Copy ODM files
mount /tmp
cfgmgr -p2 2=normal
rc.boot 3 rootvg or
cfgmgr -p3 3=service
savebase
Figure 6-8. rc.boot summary AN152.2
Notes:
Summary
During rc.boot 1, all base devices are configured. This is done by cfgmgr -f which
executes all phase 1 methods from Config_Rules.
During rc.boot 2, the rootvg is varied on. All /dev files and the customized ODM files
from the RAM file system are merged to disk.
During rc.boot 3, all remaining devices are configured by cfgmgr -p. The
configuration manager reads the Config_Rules class and executes the corresponding
methods. To synchronize the ODMs, savebase is called that writes the ODM from the
disk back to the boot logical volume.

V7.0.1
Instructor Guide

Purpose — Summarize rc.boot script.
Details — Describe the highlights from the rc.boot phases that are shown in the table.
Transition statement — Let’s look at a common problem in rc.boot phase 2 - the failure
to mount the file systems because of corruption.
Instructor Guide
Fixing corrupted file systems and logs

IBM Power Systems
Boot to maintenance mode

Access rootvg without mounting file systems
Rebuild file system log and run fsck:

# fsck -y -V jfs2 /dev/hd11admin
Figure 6-9. Fixing corrupted file systems and logs AN152.2
Notes:
JFS log or JFS2 log corrupt?

To fix a corrupted JFS or JFS2 log, boot in maintenance mode and access the rootvg,
but do not mount the file systems. In the maintenance shell, issue the logform
command and do a file system check for all file systems that use this JFS or JFS2 log.
Keep in mind what file system type your rootvg had: JFS or JFS2.
For JFS:
# logform -V jfs /dev/hd8
# fsck -y -V jfs /dev/hd1
# fsck -y -V jfs /dev/hd9var
# fsck -y -V jfs /dev/hd10opt

V7.0.1
Instructor Guide
Uempty # fsck -y -V jfs /dev/hd11admin

exit
For JFS2:
# fsck -y -V jfs2 /dev/hd11admin
exit
The logform command initializes a new JFS transaction log and this may result in loss
of data because JFS transactions may be destroyed. Your machine will boot after the
JFS log has been repaired.
JFS log corruption typically happens when the system crashes or is taken down in a
hard manner by the administrator.
Not that the JFS log recovery described does not ensure that disk updates that were in
process were completed. Determining what has processed and what needs
reprocessing is the responsibility of the applications using their transaction logs and any
checkpoint processing that was completed.
Instructor Guide
Instructor notes:
Purpose — Explain how to fix a corrupted file system.
Details — Point out that a common cause of this type of corruption is the use of the HMC
shutdown immediate option for an LPAR with a running operating system. This is the
equivalent of cutting power to a computer while the operating system is running, which
does not allow for a proper shutdown. An administrator should always use (when possible)
the HMC OS shutdown option or issue the shutdown command from the LPAR command
prompt.
Transition statement — Let’s review the phases of rc.boot.

V7.0.1
Instructor Guide
Uempty
Lets review: rc.boot (1 of 3)

IBM Power Systems
(1)
rc.boot 1
(2)
(4)
(3)
(5)
Figure 6-10. Let’s review: rc.boot (1 of 3) AN152.2
Notes:
Instructions
Using the following questions, put the solutions into the visual.
1. What calls rc.boot 1? Is it:
• /etc/init from hd4
• /etc/init from the RAMFS in the boot image
2. Which command copies the ODM files from the boot image into the RAM file
system?
3. Which command triggers the execution of all phase 1 methods in Config_Rules?
4. Which ODM files contain the devices that have been configured in rc.boot 1?
• ODM files in hd4
• ODM files in RAM file system
5. How can you determine the last boot device?
Instructor Guide
Instructor notes:
Purpose — Review and test the students understanding of rc.boot phase 1.
Details — This is the first of three reviews. You can review each one separately, or have
the students do all three, then review them all.
Lets review solution: rc.boot (1 of 3)

IBM Power Systems
(1)
/etc/init from RAMFS rc.boot 1

in the boot image
(2)
restbase
(4)
(3)
ODM files in RAM cfgmgr -f
file system
(5)
bootinfo -b
Transition statement — Now, let’s review rc.boot phase 2.

V7.0.1
Instructor Guide
Uempty

IBM Power Systems
(5)
rc.boot 2
(1) (6)
(2) (7)
(3)
(8)
557
(4)
Notes:
Instructions
Please order the following eight expressions in the correct sequence.
1. Turn on paging
2. Merge RAM /dev files.
3. Copy boot messages to alog
4. Activate rootvg
5. Mount /var; copy dump; unmount /var
6. Mount /dev/hd4 onto / in RAMFS
7. Copy RAM ODM files
8. Finally, answer the following question. Put the answer in box 8:
Your system stops booting with an LED 557. Which command failed?
Instructor Guide
Instructor notes:
Purpose — Review and test the students, understanding of rc.boot phase 2.
Details — This is the second of three reviews. You can review each one separately, or
have the students do all three, then review them all.

IBM Power Systems
(5)
rc.boot 2 Merge RAM /dev files
(1) (6)
Activate rootvg Copy RAM ODM files
Mount /dev/hd4 (2) (7)

Copy boot messages
on / in RAMFS to alog
Mount /var (3)

Copy dump (8)
Unmount /var 557 mount /dev/hd4
(4)
Turn on
paging
Additional information — Question 8 is important for the lab. The command that failed is
the mount of /dev/hd4. One reason for this might be a damaged log logical volume.
Transition statement — Now, let’s review rc.boot phase 3.

V7.0.1
Instructor Guide
Uempty

IBM Power Systems
From which file is Update ODM in BLV

rc.boot 3 started?
_________
_________________
sy____ ___
/sbin/rc.boot 3 err_______
fsck -f ________ Turn off ____

mount ________
rm _________
s_______ ________&
Missing devices ?
_________=3
________ -p2 in ______ ?
________ -p3

Start Console: ______ _____________
Start CDE: _________
Notes:
Instructions
Please complete the missing information in the picture.
Your instructor will review the activity with you.
Instructor Guide
Instructor notes:
Purpose — Review and test the students understanding of rc.boot phase 3.
Details — This is the last of three reviews. You can review each one separately, or have
the students do all three, then review them all.

IBM Power Systems
savebase
/etc/inittab
syncd 60
/sbin/rc.boot3 errdemon
fsck -f /dev/hd3 Turn off LEDs

mount /tmp
rm /etc/nologin
syncvg rootvg &
chgstatus=3
cfgmgr -p2 in CuDv ?
cfgmgr -p3

Start Console: cfgcon /etc/inittab
Start CDE: rc.dt boot
Transition statement — Now, let’s switch over to the next topic.

V7.0.1
Instructor Guide
Uempty 6.2. AIX initialization part 2

What students will do — The students will review important components from the AIX
software boot process.
How students will do it — Through lecture, exercise, and checkpoint questions
• Review the configuration manager (cfgmgr)
• Review the Config_Rules object class
• Review cfgmgr output in the boot log using alog
• Review the /etc/inittab file
• Identify important LED codes
How this will help students on their job — The components that are introduced or
reviewed are vital for the AIX operating system. A good knowledge of these components is
a prerequisite for all system administrators.
Instructor Guide
Configuration manager
IBM Power Systems
Predefined
PdDv
PdAt
PdCn
cfgmgr Config_Rules
Customized Methods
CuDv Define
CuAt Device
Configure
Driver load
CuDep Change
CuDvDr
Unconfigure
unload
CuVPD Undefine
Figure 6-13. Configuration manager AN152.2
Notes:
When the configuration manager is invoked

During system boot, the configuration manager is invoked to configure all devices
detected as well as any device whose device information is stored in the configuration
database. At run time, you can configure a specific device by directly invoking the
cfgmgr command.
If you encounter problems during the configuration of a device, use cfgmgr -v. With this
option, cfgmgr shows the devices as they are configured.
Automatic configuration
Many devices are automatically detected by the configuration manager. For this to
occur, device entries must exist in the predefined device object classes. The
configuration manager uses the methods from PdDv to manage the device state, for
example, to bring a device into the defined or available state.

V7.0.1
Instructor Guide
Uempty Installing new device support

cfgmgr can be used to install new device support. If you invoke cfgmgr with the -i
flag, the command attempts to install device software support for each newly detected
device.
High-level device commands like mkdev invoke methods and allow the user to add,
delete, show, or change devices and their attributes.
Define method
When a device is defined through its define method, the information from the predefined
database for that type of device is used to create the information describing the device
specific instance. This device specific information is then stored in the customized
database.
Configure method steps

The process of configuring a device is often device-specific. The configure method for a
kernel device must:
1. Load the device driver into the kernel
2. Pass device-dependent information describing the device instance to the driver
3. Create a special file for the device in the /dev directory
Of course, many devices are not physical devices, such as logical volumes or volume
groups, these are pseudodevices. For this type of device, the configured state is not as
meaningful. However, it still has a configuration method that simply marks the device as
configured or performs more complex operations to determine if there are any devices
attached to it.
Configuration order
The configuration process requires that a device be defined or configured before a
device attached to it can be defined or configured. At system boot time, the
configuration manager configures the system in a hierarchical fashion. First the
motherboard is configured, then the buses, then the adapters that are attached, and
finally the devices that are connected to the adapters. The configuration manager then
configures any pseudodevices (volume groups, logical volumes, and so forth) that need
to be configured.
Instructor Guide
Instructor notes:
Purpose — Summarize how cfgmgr works.
Details — Explain that cfgmgr can detect devices automatically. The devices must be
defined in the predefined ODM classes. When they get defined, they are stored in the
customized ODM classes.
The cfgmgr is method or rule driven. It just uses methods to define or configure a device.
These methods are device specific and are listed in PdDv.
During the boot process, cfgmgr uses the Config_Rules class to configure the devices in
the correct sequence.
Note that the actual Config_Rules object class has more objects in each phase than are
listed in the visual.
Additional information — The output from the configuration manager is viewable in the
boot alog. During run-time, cfgmgr can be started with the flag -v, to get more information
about the devices that are configured.
Transition statement — Let’s have a look in the Config_Rules ODM class.

V7.0.1
Instructor Guide
Uempty
Config_Rules object class

IBM Power Systems
Phase seq boot rule
1 10 0 /etc/methods/defsys
1 12 0 /usr/lib/methods/deflvm
cfgmgr -f
2 12 0 /usr/lib/methods/deflvm cfgmgr -p2
2 19 0 /etc/methods/ptynode (Normal boot)
2 20 0 /etc/methods/startlft
3 12 0 /usr/lib/methods/deflvm
3 19 0 /etc/methods/ptynode cfgmgr -p3
3 20 0 /etc/methods/startlft (Service boot)
3 25 0 /etc/methods/starttty
Figure 6-14. Config_Rules object class AN152.2
Notes:
Introduction
The Config_Rules ODM object class is used by cfgmgr during the boot process. The
phase attribute determines when the respective method is called.
Phase 1
All methods with phase=1 are executed when cfgmgr -f is called. The first method that
is started is /etc/methods/defsys, which is responsible for the configuration of all
base devices. The second method /usr/lib/methods/deflvm loads the logical volume
device driver (LVDD) into the AIX kernel.
If you have devices that must be configured in rc.boot 1, that means before the
rootvg is active, you need to place phase 1 configuration methods into Config_Rules.
A bosboot is required afterwards.
Instructor Guide
Phase 2
All methods with phase=2 are executed when cfgmgr -p2 is called. This takes place in
the third rc.boot phase, when the key switch is in normal position or for a normal boot
on a PCI machine. The seq attribute controls the sequence of the execution: The lower
the value, the higher the priority.
Phase 3
All methods with phase=3 are executed when cfgmgr -p3 is called. This takes place in
the third rc.boot phase, when the key switch is in service position, or a service boot
has been issued on a PCI system.
Sequence number
Each configuration method has an associated sequence number. When executing the
methods for a particular phase, cfgmgr sorts the methods based on the sequence
number. The methods are then invoked, one by one, starting with the smallest
sequence number. Methods with a sequence number of zero are invoked last, after
those with non-zero sequence numbers.
Boot mask
Each configuration method has an associated boot mask:
- If the boot_mask is zero, the rule applies to all types of boot.
- If the boot_mask is non-zero, the rule then only applies to the boot type specified.
For example, if boot_mask = DISK_BOOT, the rule would only be used for boots from
disk versus NETWORK_BOOT which only applies when booting through the network.

V7.0.1
Instructor Guide

Purpose — Explain how cfgmgr uses the Config_Rules object class.
Details — Review the methods that are called when the cfgmgr is executed. Explain as
described in the notes. Keep this on an easy level. Note that we are only showing a
sampling of the objects in this object class.
Additional information — If you have devices that must be configured before the rootvg
is active, you need to add these configuration methods to the Config_Rules object class.
You also have to ensure the methods you want run are included in the RAMFS image
created by bosboot. This involves adding information to the “proto” files that are in
/usr/lib/boot.
While the visual shows cfgmgr verbose output in the log, the log would also have output
from other commands executed and status messages written by the rc.boot script for
each step that has been outlined in the lecture.
Transition statement — Let’s introduce the boot alog.
Instructor Guide
cfgmgr output in the boot log using alog

IBM Power Systems
# alog -t boot -o
-------------------------------------------------------
attempting to configure device 'sys0'
invoking /usr/lib/methods/cfgsys_rspc -l sys0
return code = 0
******* stdout *******
bus0
******* no stderr *****
-------------------------------------------------------
attempting to configure device 'bus0'
invoking /usr/lib/methods/cfgbus_pci bus0
return code = 0
******** stdout *******
bus1, scsi0
****** no stderr ******
-------------------------------------------------------
invoking /usr/lib/methods/cfgbus_isa bus1
return code = 0
******** stdout ******
fda0, ppa0, sa0, sioka0, kbd0
****** no stderr *****
Figure 6-15. cfgmgr output in the boot log using alog , AN152.2
Notes:
The boot log

Because no console is available during the boot phase, the boot messages are
collected in a special file, which, by default, is /var/adm/ras/bootlog. As shown in the
visual, you have to use the alog command to view the contents of this file.
To view the boot log, issue the command as shown, or use the smit alog fastpath.
Here is an example command and output:
# alog -t boot -o
-------------------------------------------------------
attempting to configure device 'sys0'
invoking /usr/lib/methods/cfgsys_rspc -l sys0
return code = 0
******* stdout *******
bus0

V7.0.1
Instructor Guide
Uempty ******* no stderr *****

-------------------------------------------------------
invoking /usr/lib/methods/cfgbus_pci bus0
return code = 0
******** stdout *******
bus1, scsi0
****** no stderr ******
-------------------------------------------------------
invoking /usr/lib/methods/cfgbus_isa bus1
return code = 0
******** stdout ******
fda0, ppa0, sa0, sioka0, kbd0
****** no stderr *****
If you have boot problems, it is always a good idea to check the boot alog file for
potential boot error messages. All output from cfgmgr is shown in the boot log, as well
as other information that is produced in the rc.boot script.
The default boot log file size in AIX 5L V5.1 (8 KB) was too small to capture the entire
output of a system boot in AIX 5L. The default boot log size in AIX 5L V5.2 is 32 KB and
in AIX 5L V5.3 (and later) it is 128 KB. If you want to increase the size of the boot log,
for example to 256 KB, issue the following command:
# print “Resizing boot log” | alog -C -t boot -s 262144
Instructor Guide
Instructor notes:
Purpose — Describe the alog command to identify boot messages.
Details — Describe how boot messages produced during the boot process are written to
an alog file. Show how the alog command can be used. The bootlog shows more than
the output of the cfgmgr -v during rc.boot execution. The various rc.boot steps which
we have covered have messages written to the boot log.
Additional information — Describe how the boot log, /var/adm/ras/bootlog, might be
increased to a bigger size. This often had to be done prior to AIX 5L V5.2 as the default
size of 8 KB was very small. To display the size of the log run: alog -t boot -L
The alog is circular; meaning the oldest information will be automatically overwritten by the
newest information.
Transition statement — Let’s review the /etc/inittab file.

V7.0.1
Instructor Guide
Uempty
/etc/inittab file
IBM Power Systems
init:2:initdefault:
brc::sysinit:/sbin/rc.boot 3 >/dev/console 2>&1 # Phase 3 of system boot
powerfail::powerfail:/etc/rc.powerfail 2>&1 | alog -tboot > /dev/console #
mkatmpvc:2:once:/usr/sbin/mkatmpvc >/dev/console 2>&1
atmsvcd:2:once:/usr/sbin/atmsvcd >/dev/console 2>&1
tunables:23456789:wait:/usr/sbin/tunrestore -R > /dev/console 2>&1 # Set tunab
securityboot:2:bootwait:/etc/rc.security.boot > /dev/console 2>&1
rc:23456789:wait:/etc/rc 2>&1 | alog -tboot > /dev/console # Multi-User checks
rcemgr:23456789:once:/usr/sbin/emgr -B > /dev/null 2>&1
fbcheck:23456789:wait:/usr/sbin/fbcheck 2>&1 | alog -tboot > /dev/console # ru
srcmstr:23456789:respawn:/usr/sbin/srcmstr # System Resource Controller
rctcpip:23456789:wait:/etc/rc.tcpip > /dev/console 2>&1 # Start TCP/IP daemons
mkcifs_fs:2:wait:/etc/mkcifs_fs > /dev/console 2>&1
sniinst:2:wait:/var/adm/sni/sniprei > /dev/console 2>&1
rcnfs:23456789:wait:/etc/rc.nfs > /dev/console 2>&1 # Start NFS Daemons
cron:23456789:respawn:/usr/sbin/cron
piobe:2:wait:/usr/lib/lpd/pioinit_cp >/dev/null 2>&1 # pb cleanup
cons:0123456789:respawn:/usr/sbin/getty /dev/console
qdaemon:23456789:wait:/usr/bin/startsrc -sqdaemon
writesrv:23456789:wait:/usr/bin/startsrc -swritesrv
uprintfd:23456789:respawn:/usr/sbin/uprintfd
shdaemon:2:off:/usr/sbin/shdaemon >/dev/console 2>&1 # High availability
Do not use an editor to change /etc/inittab

Use mkitab, chitab, rmitab instead
Figure 6-16. /etc/inittab file AN152.2
Notes:
Purpose of /etc/inittab
The /etc/inittab file supplies information for the init process. Note how the rc.boot
script is executed out of the inittab file to configure all remaining devices in the boot
process.
Modifying /etc/inittab
Do not use an editor to change the /etc/inittab file. One small mistake in /etc/inittab,
and your machine will not boot. Instead use the commands mkitab, chitab, and
rmitab to edit /etc/inittab. The advantage of these commands is that they always
guarantee a non-corrupted /etc/inittab file. If your machine stops booting with an LED
553, this indicates a bad /etc/inittab file in most cases.
Instructor Guide
Consider the following examples:

- To add a line to /etc/inittab, use the mkitab command. For example:
# mkitab "myid:2:once:/usr/local/bin/errlog.check"
- To change /etc/inittab so that init will ignore the line tty1, use the chitab
command:
# chitab "tty1:2:off:/usr/sbin/getty /dev/tty1"
- To remove the line tty1 from /etc/inittab, use the rmitab command. For example:
# rmitab tty1
Viewing /etc/inittab
The lsitab command can be used to view the /etc/inittab file. For example:
# lsitab dt
dt:2:wait:/etc/rc.dt
If you issue lsitab -a, the complete /etc/inittab file is shown.
telinit and run levels

Use the telinit command to signal the init daemon:
- To tell the init daemon to re-read the /etc/inittab use:
# telinit q
- To tell the init daemon to reset the environment to match a different (or same) run
level use:
# telinit n (where n is the desired run level)
- To query what the current run level is use:
# who -r

V7.0.1
Instructor Guide
Uempty Example inittab:

init:2:initdefault:
brc::sysinit:/sbin/rc.boot 3 >/dev/console 2>&1 # Phase 3 of system boot
powerfail::powerfail:/etc/rc.powerfail 2>&1 | alog -tboot > /dev/console #
mkatmpvc:2:once:/usr/sbin/mkatmpvc >/dev/console 2>&1
atmsvcd:2:once:/usr/sbin/atmsvcd >/dev/console 2>&1
tunables:23456789:wait:/usr/sbin/tunrestore -R > /dev/console 2>&1
securityboot:2:bootwait:/etc/rc.security.boot > /dev/console 2>&1
rc:23456789:wait:/etc/rc 2>&1 | alog -tboot > /dev/console
rcemgr:23456789:once:/usr/sbin/emgr -B > /dev/null 2>&1
fbcheck:23456789:wait:/usr/sbin/fbcheck 2>&1 | alog -tboot > /dev/console
srcmstr:23456789:respawn:/usr/sbin/srcmstr # System Resource Controller
rctcpip:23456789:wait:/etc/rc.tcpip > /dev/console 2>&1
mkcifs_fs:2:wait:/etc/mkcifs_fs > /dev/console 2>&1
sniinst:2:wait:/var/adm/sni/sniprei > /dev/console 2>&1
rcnfs:23456789:wait:/etc/rc.nfs > /dev/console 2>&1 # Start NFS Daemons
piobe:2:wait:/usr/lib/lpd/pioinit_cp >/dev/null 2>&1 # pb cleanup
cons:0123456789:respawn:/usr/sbin/getty /dev/console
writesrv:23456789:wait:/usr/bin/startsrc -swritesrv
uprintfd:23456789:respawn:/usr/sbin/uprintfd
shdaemon:2:off:/usr/sbin/shdaemon >/dev/console 2>&1 # High availability
Instructor Guide
Instructor notes:
Purpose — Describe the /etc/inittab file and some important commands to view and
manipulate this file.
Details — Show that rc.boot is executed out of /etc/inittab. Describe that it is risky to edit
the /etc/inittab file. It is always better to use the commands described in the notes.
Additional information — Point out that a corrupted /etc/inittab file is indicated by LED
553. The students will see this in their exercise.
The mkitab, chitab, and rmitab commands provide automatic syntax checking. The line
must match the proper format for /etc/inittab.
There is a -i option with mkitab to insert the new line anywhere in the /etc/inittab file.
Without the -i, the line will be appended to the end of the file.
Transition statement — Let’s describe the basics for system hang detection.

V7.0.1
Instructor Guide
Uempty
Boot problem management

IBM Power Systems
Symptoms Possible User action

or LED code causes
AA060011 Bad bootlist Boot SMS, update bootlist.
message: Cant Damaged BLV Boot to maintenance, Access the rootvg. Re-
find OS Image create the BLV:
# bosboot -ad /dev/hdiskx
551, 555, 557 Filesystem or log corrupted Rebuild journal log and fsck the file systems.
rootvg locked (only if 551) Unlock rootvg (chvg –u rootvg)
552, 554, 556 File system superblock Rebuild journal log and fsck the file systems
corrupted Or recover superblock from secondary
Reduced ODM corrupted If that fails, recover from mksysb
553 Corrupt /etc/inittab Access the rootvg. Check /etc/inittab (empty,
/etc/environment missing or corrupt?). Check /etc/environment
523 - 534 ODM files missing ODM files are missing or inaccessible.
Restore missing files from a system backup
518, 517 Failed or hung filesystem Check /etc/filesystem.
mount ( /usr, /var, /tmp) Check network (if remote mount), file systems
(fsck) and hardware.
Figure 6-17. Boot problem management AN152.2
Notes:
Introduction
The visual shows some common boot errors that might happen during the AIX software
boot process.
Bootlist wrong?
If the bootlist is wrong, the system cannot boot. This is easy to fix. Boot in SMS and
select the correct boot device. Keep in mind that only hard disks with boot records are
shown as selectable boot devices.
/etc/inittab corrupt? /etc/environment corrupt?

An LED of 553 usually indicates a corrupted /etc/inittab file, but in some cases a bad
/etc/environment may also lead to a 553 LED. To fix this problem, boot in maintenance
Instructor Guide
mode and check both files. Consider using a mksysb to retrieve these files from a
backup tape.
Boot logical volume or boot record corrupt?

The next thing to try if your machine does not boot, is to check the boot logical volume.
To fix a corrupted boot logical volume, boot in maintenance mode and use the bosboot
command:
JFS log or JFS2 log corrupt?

To fix a corrupted JFS or JFS2 log, boot in maintenance mode and access the rootvg,
but do not mount the file systems. In the maintenance shell, issue the logform
command and do a file system check for all file systems that use this JFS or JFS2 log.
Keep in mind what file system type your rootvg had: JFS or JFS2.
The logform command initializes a new JFS transaction log and this may result in loss
of data because JFS transactions may be destroyed. Your machine will boot after the
JFS log has been repaired.
Superblock corrupt?
Another thing you can try is to check the superblocks of your rootvg file systems. If you
boot in maintenance mode and you get error messages like Not an AIX file system
or Not a recognized file system type, it is probably due to a corrupt superblock in
the file system.
Each file system has two super blocks. Executing fsck should automatically recover
the primary superblock by copying from the backup superblock. The following is
provided in case you need to do this manually.
For JFS, the primary superblock is in logical block 1 and a copy is in logical block 31. To
manually copy the superblock from block 31 to block 1 for the root file system (in this
example), issue the following command:
# dd count=1 bs=4k skip=31 seek=1 if=/dev/hd4 of=/dev/hd4
For JFS2, the locations are different. To manually recover the primary superblock from
the backup superblock for the root file system (in this example), issue the following
command:
# dd count=1 bs=4k skip=15 seek=8 if=/dev/hd4 of=/dev/hd4

V7.0.1
Instructor Guide
Uempty rootvg locked?

Many LVM commands place a lock into the ODM to prevent other commands from
working at the same time. If a lock remains in the ODM due to a crash of a command,
this may lead to a hanging system.
To unlock the rootvg, boot in maintenance mode and access the rootvg with file
systems. Issue the following command to unlock the rootvg:
# chvg -u rootvg
ODM files missing?

If you see LED codes in the range 523 to 534, ODM files are missing on your machine.
Use a mksysb tape of the system to restore the missing files.
Mount of /usr or /var failed?

An LED of 518 indicates that the mount of the /usr or /var file system failed. If /usr is
mounted from a network, check the network connection. If /usr or /var are locally
mounted, use fsck to check the consistency of the file systems. If this does not help,
check the hardware by running diagnostics from the Diagnostics CD.
Instructor Guide
Instructor notes:
Purpose — Describe some common causes of boot problems.
Details — Describe as explained in the student notes. Describe the meaning of 553 and
557 as they are part of the exercise.
Transition statement — Let’s review the /etc/inittab file which was described in the basic
administration course.

V7.0.1
Instructor Guide
Uempty
Let's review: /etc/inittab file

IBM Power Systems
init:2:initdefault:
brc::sysinit:/sbin/rc.boot 3
rc:2:wait:/etc/rc
fbcheck:2:wait:/usr/sbin/fbcheck
srcmstr:2:respawn:/usr/sbin/srcmstr
rctcpip:2:wait:/etc/rc.tcpip
rcnfs:2:wait::/etc/rc.nfs
dt:2:wait:/etc/rc.dt
tty0:2:off:/usr/sbin/getty /dev/tty1
myid:2:once:/usr/local/bin/errlog.check
Figure 6-18. Let's review: /etc/inittab file AN152.2
Notes:
Instructions
Answer the following questions as they relate to the /etc/inittab file shown in the visual:
1. Which process is started by the init process only one time?
The init process does not wait for the initialization of this process.
2. Which process is involved in print activities on an AIX system?
3. Which line is ignored by the init process?
4. Which line determines that multiuser mode is the initial run level of the system?
Instructor Guide
5. Where is the System Resource Controller started?
6. Which line controls network processes?
7. Which component allows the execution of programs at a certain date or time?
8. Which line executes /etc/firstboot, if it exists?
9. Which script controls starting of the CDE desktop?
10. Which line is executed in all run levels?
11. Which line takes care of varying on the volume groups, activating paging spaces,
and mounting file systems that are to be activated during boot?

V7.0.1
Instructor Guide

Purpose — Review the /etc/inittab file which was described in the basic administration
course.
Details — Give the students 10 minutes to answer the questions, then review them.
When reviewing the answers, complete the empty boxes in the visual with the highlighted
expressions. After reviewing all questions, the completed visual should look like the
following table:
Let's review solution: /etc/inittab file

IBM Power Systems
init:2:initdefault: Determine initial run-level
brc::sysinit:/sbin/rc.boot 3 Startup last boot phase
rc:2:wait:/etc/rc Multiuser initialization
fbcheck:2:wait:/usr/sbin/fbcheck Execute /etc/firstboot, if it exists
srcmstr:2:respawn:/usr/sbin/srcmstr Start the System Resource Controller
cron:2:respawn:/usr/sbin/cron Start the cron daemon
rctcpip:2:wait:/etc/rc.tcpip Startup communication daemon processes

rcnfs:2:wait::/etc/rc.nfs (nfsd, biod, ypserv, and so forth)
qdaemon:2:wait:/usr/bin/startsrc -sqdaemon Startup spooling subsystem
dt:2:wait:/etc/rc.dt Startup CDE desktop
tty0:2:off:/usr/sbin/getty /dev/tty1 Line ignored by init
myid:2:once:/usr/local/bin/errlog.check Process started only one time
1. The myid line is started only one time
The action once indicates the init process to start the process and not to wait for its
initialization. When the process ends, it will not be restarted.
2. The qdaemon line
The qdaemon controls the queueing subsystem in AIX. It manages jobs in queues and
their assignment to the different queues in the system.
Instructor Guide
3. The tty0 line is ignored by init

The action off tells the init process to ignore this line. But: If you change the action to
off and you issue the command, telinit q, the init process sends a SIGTERM
signal to the process. After 20 seconds if the process still exists, init sends a SIGKILL
signal to it.
4. The init line determines initial run level
The init command uses this entry to determine which run level to enter initially. Run
level 2 means multiuser. 1, s, m, and M mean single-user or often called maintenance
mode. brc runs at all run levels.
5. The srcmstr line starts the System Resource Controller
6. The rctcpip line starts the communication daemon processes (inetd, named, and so
forth)
The rcnfs line starts the NFS daemon processes (nfsd, biod, ypserv, and so forth.)
TCP/IP and NFS daemons are started in these scripts. Typical examples are inetd
(which controls all socket based communication), biod (for the NFS client), nfsd (for
the NFS server) or ypserv (for the NIS server) process.
7. The cron line starts the cron daemon
The cron daemon runs shell commands at specified dates and times. Use the crontab
command to administrate cron processes.
8. The fbcheck line executes /etc/firstboot, if it exists
This process executes a script /etc/firstboot, if it exists. This script is used after the
installation of an AIX system to start up any customization steps after the reboot of the
system. The program install_assist is an example of such a program that is started
after the installation.
9. The dt line controls the startup the CDE desktop
This script controls the startup of the graphical desktop.
10. Startup last boot phase
This process starts rc.boot 3 that is responsible for the final configuration of all
devices on a system. This script is executed in all run levels and before the console is
configured (sysinit).
11. The rc line activates volume groups, paging spaces, and file systems during boot.
Varyon all automatic volume groups. Activate all automatic paging spaces. Mount all file
systems marked mount=true in /etc/filesystems.

V7.0.1
Instructor Guide
Uempty Transition statement —

For an ILO (Instructor Lead On-line) class: In place of this checkpoint visual you should
play file AN152U06F19.
Elluminate.
play.
Instructor Guide
Checkpoint (1 of 2)
IBM Power Systems
1. From where is rc.boot 3 run?
2. Your system stops booting with LED 557. In which rc.boot

phase does the system stop?
3. What are some reasons for this problem (LED 557)?
4. Which ODM file is used by the cfgmgr during boot to configure

the devices in the correct sequence?
Notes:

V7.0.1
Instructor Guide

Purpose — Review and test the students, understanding of this unit.
IBM Power Systems

The answer is from the /etc/inittab file in rootvg.

The answer is rc.boot 2.

The answer is corrupted JFS log or damaged file system.
4. Which ODM file is used by the cfgmgr during boot to

configure the devices in the correct sequence?
The answer is Config_Rules.
Instructor Guide
Checkpoint (2 of 2)
IBM Power Systems
5. What is the likely cause if your system stops booting with LED 553?
6. What does the line init:2:initdefault: in /etc/inittab

mean?
Notes:

V7.0.1
Instructor Guide

Purpose —
Details —
IBM Power Systems
The answer is there is a problem with processing /etc/inittab.

mean?
The answer is this line is used by the init process, to determine the
initial run level (2=multiuser).
Instructor Guide
Exercise: System initialization: rc.boot and

inittab
IBM Power Systems
Repair a corrupted log logical volume
Analyze and fix a boot failure
Explore the rc.boot script
Figure 6-21. Exercise: System initialization: rc.boot and inittab AN152.2
Notes:

V7.0.1
Instructor Guide

Purpose — Prepare the students for the lab.
Details —
Instructor Guide
Unit summary
IBM Power Systems

Identify the steps in system initialization from loading the boot
Identify how devices are configured during the boot process
Analyze and solve boot problems
Notes:
Highlights
- After the boot image is loaded into RAM, the rc.boot script is executed three times
to configure the system.
- During rc.boot 1, devices to varyon the rootvg are configured.
- During rc.boot 2, the rootvg is varied on.
- In rc.boot 3, the remaining devices are configured.
- Processes defined in the /etc/inittab file are initiated by the init process.

V7.0.1
Instructor Guide

Instructor Guide

V7.0.1
Instructor Guide
Uempty Unit 7. LVM metadata and related problems
Estimated time
01:25

This unit explains how metadata concepts are important in
understanding and working with AIX logical volume manager
problems.

• Explain where LVM metadata information is stored
• Use importvg and exportvg to manage LVM metadata
• Solve ODM-related LVM problems

Accountability:
• Lab exercises
References
management
SG24-5422-00 AIX Logical Volume Manager from A to Z: Introduction

and Concepts (Redbook)
SG24-5433-00 AIX Logical Volume Manager from A to Z:
Troubleshooting and Commands (Redbook)
GG24-4484-00 AIX Storage Management (Redbook)
© Copyright IBM Corp. 2009, 2012 Unit 7. LVM metadata and related problems 7-1
Instructor Guide
Unit objectives
IBM Power Systems

Explain where LVM metadata information is stored
Use importvg and exportvg to manage LVM metadata
Solve ODM-related LVM problems
Notes:

V7.0.1
Instructor Guide

Details — The AIX Storage Management Redbook listed under “References” was
published in 1994, but it is still useful.
Transition statement — Let’s start with a review of LVM terms.
Instructor Guide

V7.0.1
Instructor Guide
Uempty 7.1. LVM data representation: Overview

What students will do — The students will learn where LVM information is kept and which
part of this information resides in the ODM.
• Where LVM information is stored
• Which LVM information resides in the ODM and on disk control blocks
• How to solve ODM-related problems
How this will help students on their job — Knowing where LVM data is stored in AIX will
make it easier for the students to analyze and avoid LVM errors.
Instructor Guide
Review: LVM terms

IBM Power Systems
Physical Logical
Partitions Partitions
Physical Logical
Volumes Volume
Volume
Group
Figure 7-2. Review: LVM terms AN152.2
Notes:
Introduction
This visual and the associated student notes will provide a review of basic LVM terms.
Volume groups, physical volumes, and physical partitions

A volume group (VG) consists of one or more physical volumes (PV) that are divided
into physical partitions (PP). When a volume group is created, a physical partition size
has to be specified. This physical partition size is the smallest allocation unit for the
LVM. The partition size is specified in units of megabytes from 1 (1 MB) through
131,072 (1 GB) for normal or big volume groups (more on these later). The physical
partition size for scalable volume groups can be up to 128 GB. The physical partition
size must be equal to a power of 2 (example 1, 2, 4, 8). The default physical partition
size values for normal and big volume groups will be the lowest value that can be used
to remain within a limitation of 1016 physical partitions per physical volume. The default

V7.0.1
Instructor Guide
Uempty value for scalable volume groups (introduced in AIX 5L V5.3) will be the lowest value
that can be used to accommodate 2040 physical partitions per physical volume.
For scalable volume groups, the maximum number of physical partitions is no longer
defined on a per disk basis but applies to the entire volume group. The scalable volume
group can hold up to 2097152 (2048 K) physical partitions.
Logical volumes and logical partitions

The LVM provides logical volumes (LVs), that can be created, extended, moved and
deleted at run time. Logical volumes may span several disks, which is one of the
biggest advantages of the LVM.
Logical volumes contain the JFS and JFS2 file systems, paging spaces, journal logs,
the boot logical volumes or nothing (when used as a raw logical volume).
Logical volumes are divided into logical partitions (LPs), where each logical partition is
associated with at least one physical partition.
Instructor Guide
Instructor notes:
Purpose — Introduce some basic LVM terms.
Details — Use the student notes to guide your presentation.
Additional information — If no physical partition size is specified when creating the
volume group, the mkvg command attempts to figure out an appropriate physical partition
size based on the disks in the volume group.
Transition statement — Let’s look at the unique identifiers used by LVM for the volume
groups, logical volumes, and physical volumes.

V7.0.1
Instructor Guide
Uempty
LVM identifiers
IBM Power Systems
Goal: Unique worldwide identifiers for:

volume groups, hard disks, and logical volumes
# lsvg rootvg | grep IDENT

... VG IDENTIFIER: 00c35ba000004c00000001157f54bf78
# lspv 32 bytes long
hdisk0 00c35ba07b2e24f0 rootvg active
# lsattr –El hdisk# -a unique_id
32 bytes long
unique_id (16 are shown)
3321360050768019102C0F000000000006E2904214503IBMfcp
# lslv hd4 | grep IDENT
LV IDENTIFIER: 00c35ba000004c00000001157f54bf78.4 ...
...
VGID.minor number
# uname -m
00C35BA04C00
Figure 7-3. LVM identifiers AN152.2
Notes:
Use of identifiers
The LVM uses identifiers for disks, volume groups, and logical volumes. As volume
groups could be exported and imported between systems, these identifiers must be
unique worldwide.
AIX generated identifiers are based on the CPU ID of the creating host and a
timestamp.
Volume group identifiers

As shown on the visual, the volume groups identifiers (VGID) have a length of 32 bytes.
Instructor Guide
Disk identifiers
Disk identifiers have a length of 32 bytes, but currently the last 16 bytes are unused and
are all set to 0 in the ODM. Notice that, as shown on the visual, only the first 16 bytes of
this identifier are displayed in the output of the lspv command.
In a SAN environment, path management needs to have a method for identifying a disk
discovered over two different paths is actually the same disk. Some storage solutions,
in an AIX environment use the PVID for this purpose. Other storage solutions use a
IEEE volume identifier (ieee_volname) or a UDID unique identifier (unique_id) for this
purpose. Each of these would be attributes of the disk in the ODM.
The PVID attribute is set the first time a disk is assigned to a volume group.
If you ever have to manually update the disk identifiers in the ODM, do not forget to add
16 zeros to the physical volume ID.
Logical volume identifiers

The logical volume identifiers consist of the volume group identifier, a period, and the
minor number of the logical volume.

V7.0.1
Instructor Guide

Purpose — Introduce the LVM identifiers.
Details — Explain using the information provided in the student notes. Emphasize that
these identifiers are important, since the logical name we use may not be associated and in
various problem scenarios we will need to work with the unique identifier instead.
Additional information — Be sure to explain that physical volume IDs are 32 bytes long.
The last 16 bytes are currently set to zeros. That is important for the lab.
Transition statement — Let’s talk about where LVM stores its information.
Instructor Guide
LVM data on disk control blocks

IBM Power Systems
Volume Group Descriptor Area (VGDA)

Most important data structure of LVM
Global to the volume group (same on each disk)
One or two copies per disk
Volume Group Status Area (VGSA)

Tracks the state of mirrored copies
One or two copies per disk
Logical Volume Control Block (LVCB)

Has historically occupied the first 512 bytes of each logical volume
Contains logical volume attributes (policies, number of copies)
Scalable volume groups: The information is merged into VGDA
Figure 7-4. LVM data on disk control blocks AN152.2
Notes:
Disk control blocks used by LVM

The LVM uses three different disk control blocks:
- The Volume Group Descriptor Area (VGDA) is the most important data structure of
the LVM. A redundant copy is kept on each disk that is contained in a volume group.
Each disk contains the complete allocation information of the entire volume group.
- The Volume Group Status Area (VGSA) tracks the status of all physical volumes in
the volume group (active or missing) and the state of all allocated physical
partitions in the volume group (active or stale). Each disk in a volume group
contains a VGSA.
- The Logical Volume Control Block (LVCB) traditionally resides in the first 512 bytes
of each logical volume. If raw devices are used (for example, many database
systems use raw logical volumes), be careful that these programs do not destroy the
LVCB. However, LVCB is not kept at this location in scalable volume groups, but

V7.0.1
Instructor Guide
Uempty instead is kept in the same reserved disk area as the VGDA. Also, the administrator
of a big volume group can use the -T O option of the mklv command to request that
the LVCB not be stored in the beginning of the logical volume, but instead part of the
VGDA.
VGSA for scalable volume groups

The VGSA for scalable volume groups consists of three areas: physical volume missing
area (PVMA), mirror write consistency dirty bit area (MWC_DBA), and PP status area
(PPSA).
- Physical volume missing area: The PVMA tracks if any of the disks are missing
- MWC dirty bit area: The MWC_DBA holds the status for each logical volume if
passive mirror write consistency is used
- PP status area: The PPSA logs any stale PPs
The overall size reserved for the VGSA is independent of the configuration parameters
of the scalable volume group and stays constant. However, the size of the contained
PPSA changes in proportion to the configured maximum number of PPs.
LVCB-related considerations
For normal volume groups, the LVCB resides in the first block of the user data within the
logical volume. Big volume groups keep additional LVCB information in the VGDA. The
LVCB structure on the first logical volume user block and the LVCB structure within the
VGDA are similar but not identical. If a big volume group was created with the -T O
option of the mkvg command, no LVCB will occupy the first block of the logical volume.
With scalable volume groups, logical volume control information is no longer stored on
the first user block of any logical volume. Therefore, no precautions have to be taken
when using raw logical volumes, because there is no longer a need to preserve the
information held by the first 512 bytes of the logical device.
Instructor Guide
Instructor notes:
Purpose — Introduce the disk control blocks.
Transition statement — Let’s see which other locations are used to store LVM data.

V7.0.1
Instructor Guide
Uempty
LVM data in the operating system

IBM Power Systems
AIX files
/etc/vg/vgVGID Handle to the VGDA copy in memory
/dev/hdiskX Special file for a disk
/dev/VGname Special file for administrative access to a volume
group
/dev/LVname Special file for a logical volume
/etc/filesystems Used by the mount command to associate
logical volume name, file system log, and
mount point
Object Data Manager (ODM)

Metadata on physical volumes, volume groups, and logical volumes
CuDv, CuAt, CuDvDr, CuDep
Figure 7-5. LVM data in the operating system AN152.2
Notes:
LVM information stored in the ODM

Physical volumes, volume groups, and logical volumes are handled as devices in AIX.
Every physical volume, volume group, and logical volume is defined in the customized
object classes in the ODM.
LVM information stored in AIX files

As shown on the visual, many AIX files also contain LVM-related data.
The VGDA is always stored by the kernel in memory to increase performance. This
technique is called a memory-mapped file. The handle is always a file in the /etc/vg
directory. This filename always reflects the volume group identifier.
Instructor Guide
Instructor notes:
Purpose — Describe where LVM data is stored.
Details — Explain using the information in the student notes. Keep this on an overview
level.
Transition statement — Let's look (at a high level) at what ODM classes hold LVM
metadata.

V7.0.1
Instructor Guide
Uempty
LVM related ODM objects

IBM Power Systems
CuDv - Identifies as devices:

Volume groups, physical volumes, logical volumes
CuAt - Attributes for each LVM entity, including:

Physical volumes PVID value (pvid)
Logical volumes LVID value (lvserial_id)
Volume Groups VGID value (vgserial_id)
Volumes groups PVIDs (pv) one for each PV
CuDep - One object per logical volume dependency
CuDvDr - Device driver information:

Object for each volume group, physical volume, and logical volume
Figure 7-6. LVM related ODM objects AN152.2
Notes:
Overview
The LVM metadata is maintained in the ODM database has a large overlap with the
information maintained in the VGDA and LVCB control blocks. Yet, there is information
in the control blocks (such as the mapping of logical partitions) that is not kept in the
ODM, and there is information (such as device drivers and logical names) that is not
kept in the control blocks. Each metadata location plays a special role. For the
information they have in common, there are mechanisms to ensure that they do not
conflict.
LVM related ODM object classes

The visual provides an overview of the ODM objects. Each of these will be covered in
much more detail later in the unit.
Instructor Guide
Instructor notes:
Purpose — Provide an overview of the ODM object classes that hold LVM metadata
information.
Details —
Transition statement — Let’s look at how the importvg and exportvg commands relate
to these two LVM metadata locations.

V7.0.1
Instructor Guide
Uempty 7.2. Export and import

What students will do — The students will identify how to export and import volume
groups.
• Identify how to export a volume group
• Identify how to import a volume group
How this will help students on their job — Export and import are important features in
AIX. They can be used to easily transfer data between systems and provide a method to fix
ODM problems.
Instructor Guide
Exporting a volume group

IBM Power Systems
moon
hdisk9
To export a volume group:
lv10
lv1 1. Unmount all file systems
loglv1 from the volume group:
01
# umount /dev/lv10
# umount /dev/lv11
myvg 2. Vary off the volume group:

# varyoffvg myvg
3. Export the volume group:

# exportvg myvg
The complete volume group

is removed from the ODM
Figure 7-7. Exporting a volume group AN152.2
Notes:
The scenario
The exportvg and importvg commands can be used to fix ODM problems. These
commands also provide a way to transfer data between different AIX systems. This
visual provides an example of how to export a volume group.
The disk, hdisk9, is connected to the system moon. This disk belongs to the myvg
volume group. This volume group needs to be transferred to another system.
Procedure to export a volume group

Execute the following steps to export the volume group:
1. Unmount all file systems from the volume group. In the example, there are three
logical volumes in myvg; lv10, lv11, and loglv01. The loglv01 logical volume is
the JFS log device for the file systems in myvg, which is closed when all file
systems are unmounted.

V7.0.1
Instructor Guide
Uempty 2. When all logical volumes are closed, use the varyoffvg command to vary off the
volume group.
3. Finally, export the volume group, using the exportvg command. After this point,
the complete volume group (including all file systems and logical volumes) is
removed from the ODM.
After exporting the volume group, the disks in the volume group can be
transferred to another system.
Instructor Guide
Instructor notes:
Purpose — Explain how to export a volume group.
Details —
Transition statement — Let’s describe how to import a volume group.

V7.0.1
Instructor Guide
Uempty
Importing a volume group

IBM Power Systems
To import a volume group:
1. Configure the disks.
2. Import the volume group:

# importvg -y myvg hdisk3
mars 3. Mount the file systems:

# mount /dev/lv10
lv10 # mount /dev/lv11
lv11
loglv
0 1
The complete volume group is
hdisk3 added to the ODM
myvg
Figure 7-8. Importing a volume group AN152.2
Notes:
Procedure to import a volume group

To import a volume group into a system, for example into a system named mars,
execute the following steps:
1. Connect all disks (in our example we have only one disk) and reboot the system
so that cfgmgr will configure the added disks.
2. You only have to specify one disk (using either hdisk# or the PVID) in the
importvg command. Because all disks contain the same VGDA information, the
system can determine this information by querying any VGDA from any disk in
the volume group.
If you do not specify the -y flag, the command will generate a new volume group
name.
The importvg command generates completely new ODM entries.
Instructor Guide
In AIX V4.3 and subsequent releases, the volume group is automatically varied
on.
3. Finally, mount the file systems.

V7.0.1
Instructor Guide

Purpose — Explain how to import a volume group.
Details —
Additional information — Prior to AIX V4.3, you had to check whether the volume group
is varied on after the importvg. If the volume group is not automatically varied on, execute
the varyonvg command to vary on the volume group.
As of AIX 5L V5.2, importvg was enhanced to accept a PVID as a command line
argument. For example:
# importvg -y myvg 0001810fd3838c5e
Beginning with AIX 5L V5.3, the default algorithm for the importvg command was
enhanced to reduce the execution time while maintaining a maximum of integrity
protection. It is no longer the default to scan every disk of a system for an import operation.
Beginning with AIX 5L V5.3, the importvg command uses the redefinevg command to get
all the PVIDs by reading the VGDA of the disk that is related to the given volume group.
Then, only the initial LVM records for those physical volumes are examined. The default
method of previous AIX releases used to read the LVM record of every disk in the system
trying to match the disks that are listed in the VGDA. Beginning with AIX 5L V5.3, this
method will be an error path to try other disks in the system, if needed.
Transition statement — What happens if logical volumes already exist during the
importvg?
Instructor Guide
importvg and duplicate names

IBM Power Systems
Avoid duplicate logical volume names and file system names

between systems
Avoid default names such as fslv00 and lv01
Use functionally meaningful names such as db2pay00
• importvg will generate a new logical volume name for a

duplicate
• importvg will not create an /etc/filesystems entry for a

duplicate label (mount point)
Figure 7-9. importvg and duplicate names AN152.2
Notes:
Duplicate names during importvg

If a logical volume name or a file system name (label) already exists on the system to
which you are importing a volume group, you run into problems. The best way to avoid
this situation is to have a naming convention for your logical volume and file system
names which ensures uniqueness across systems. The common reasons for having
duplicates is the acceptance and use of the AIX default names.
Duplicate logical volume names

If you are importing a volume group with logical volumes that already exist on the
system, the importvg command renames the logical volumes from the volume group
that is being imported.

V7.0.1
Instructor Guide
Uempty Duplicate file system names and /etc/filesystems stanzas

Normally the importvg command creates new stanzas in /etc/filesystems for file
systems in the imported volume group. If importvg finds that the new file system’s label
duplicates the label of an existing stanza, it will not create the new stanza and provides
an error message to that affect.
Instructor Guide
Instructor notes:
Purpose — Explain the problems and solutions related to duplicate names when using
importvg.
Details —
Transition statement — What happens if logical volumes already exist during the
importvg?

V7.0.1
Instructor Guide
Uempty
importvg and existing logical volumes

IBM Power Systems
mars
lv10
lv11
loglv0
1
hdisk3
myvg
lv10
lv11
loglv importvg: changing LV name lv10 to fslv00
01 importvg: changing LV name lv11 to fslv01
hdisk2
datavg
importvg can also accept the PVID in place of the hdisk name
Figure 7-10. importvg and existing logical volumes AN152.2
Notes:
Renaming logical volumes

If you are importing a volume group with logical volumes that already exist on the
system, the importvg command renames the logical volumes from the volume group
that is being imported.
The logical volumes /dev/lv10 and /dev/lv11 exist in both volume groups. During the
importvg command, the logical volumes from myvg are renamed to /dev/fslv00 and
/dev/fslv01.
Instructor Guide
Instructor notes:
Purpose — Explain what happens if a logical volume already exists on a system during the
import.
Details —
Transition statement — Let’s describe what happens if a file system already exists during
an import.

V7.0.1
Instructor Guide
Uempty
importvg and existing file systems (1 of 2)

IBM Power Systems
/dev/lv10: /home/sarah /dev/lv23: /home/peter

/dev/lv11: /home/michael /dev/lv24: /home/michael
/dev/loglv00: log device /dev/loglv01: log device
Warning: mount point /home/michael already

exists in /etc/filesystems
# umount /home/michael
# mount -o log=/dev/loglv01 /dev/lv24 /home/michael
Figure 7-11. importvg and existing file systems (1 of 2) AN152.2
Notes:
Using umount and mount

If a file system (for example /home/michael) already exists on a system, you run into
problems when you mount the file system that was imported.
One method to get around this problem is to:
1. Unmount the file system that exists on the system. For example, /home/michael
from datavg
2. Mount the imported file system. Note that you have to specify the:
- Log device (-o log=/dev/lvlog01)
- Logical volume name (/dev/lv24)
- Mount point (/home/michael)
Instructor Guide
If the file system type is jfs2, you have to specify this as well
(-V jfs2). You can get this information by running the command
getlvcb lv24 -At
Another method is to add a new stanza to the /etc/filesystems file. This is covered in
the next visual.

V7.0.1
Instructor Guide

Purpose — Describe what happens if a file system already exists during the import.
Details —
Transition statement — Let’s see how to add a stanza to the /etc/filesystems file.
Instructor Guide
importvg and existing file systems (2 of 2)

IBM Power Systems
# vi /etc/filesystems
/dev/lv10: /home/sarah
/home/michael: /dev/lv11: /home/michael
dev = /dev/lv11
vfs = jfs /dev/loglv00: log device
log = /dev/loglv00
mount = false datavg
options = rw
account = false
/home/michael_moon: /dev/lv23: /home/peter

dev = /dev/lv24 /dev/lv24: /home/michael
vfs = jfs
log = /dev/loglv01 /dev/loglv01: log device
mount = false hdisk3 (myvg)
options = rw
account = false
# mount /home/michael Mount point must exist

# mount /home/michael_moon
Figure 7-12. importvg and existing file systems (2 of 2) AN152.2
Notes:
Create a new stanza in /etc/filesystems

If you need both file systems (the imported and the one that already exists) mounted at
the same time, you must create a new stanza in /etc/filesystems. In our example, we
create a second stanza for our imported logical volume, /home/michael_moon. The
fields in the new stanza are:
- dev specifies the logical volume, in our example /dev/lv24.
- vfs specifies the file system type, in our example a journaled file system.
- log specifies the JFS log device for the file system.
- mount specifies whether this file system should be mounted by default. The value
false specifies no default mounting during boot. The value true indicates that a file
system should be mounted during the boot process.
- options specifies that this file system should be mounted with read and write
access.

V7.0.1
Instructor Guide
Uempty - account specifies whether the file system should be processed by the accounting
system. A value of false indicates no accounting.
Before mounting the file system /home/michael_moon, the corresponding mount point
must be created.
Instructor Guide
Instructor notes:
Purpose — Describe how to add a stanza to /etc/filesystems.
Details — This might be a good place to stop and have the students execute the first two
parts of the matching lab exercise, focusing using exportvg and importvg. The other
option is to do all of the exercises at the end of the unit.
Additional information — To discover the information contained in the newly imported
volume group, use the standard LVM tools:
To see the logical volume names:
# lsvg -l myvg
To see details of the logical volumes:
# lslv lvname
These commands will assist in creating the new stanza in /etc/filesystems.
Transition statement — Let’s next look at the details of the metadata which is stored in
the VGDA, LCB, and the ODM object classes.

V7.0.1
Instructor Guide
Uempty 7.3. LVM Metadata details

What students will do — Study the details of what LVM related metadata is stored in the
VGDA, LVCB, and ODM objects.
How students will do it — Lecture, discussion, and exercises.
What students will learn — How to display the metadata information and, for the ODM,
details that will help with working with individual object to fix problems.
How this will help students on their job — LVM problems develop which can require
patching the LVM metadata. Procedures in working with such problems will be covered
later in this until and again in the next unit. This topic builds the basis for understanding
these procedures.
Instructor Guide
Contents of the VGDA

IBM Power Systems
Header time stamp Updated when volume group is changed
PVIDs only (no physical volume names)

Physical volume list
VGDA count and physical volume state
LVIDs and logical volume names
Logical volume list
Number of copies
Physical partition Maps logical partitions to physical
map partitions
Must contain same value as header time

Trailer time stamp
stamp
Figure 7-13. Contents of the VGDA AN152.2
Notes:
Introduction
The table in the visual shows the contents of the VGDA. The individual items listed are
discussed in the paragraphs that follow.
Time stamps
The time stamps are used to check if a VGDA is valid. If the system crashes while
changing the VGDA, the time stamps will differ. The next time the volume group is
varied on, this VGDA is marked as invalid. The latest intact VGDA will then be used to
overwrite the other VGDAs in the volume group.

V7.0.1
Instructor Guide
Uempty Physical volume list

The VGDA contains the physical volume list. Note that no disk names are stored, only
the unique disk identifiers are used. For each disk, the number of VGDAs on the disk
and the physical volume state is stored. We will talk about physical volume states later
in this unit.
Logical volume list

The VGDA contains a record of the logical volumes that are part of the volume group. It
stores the logical volume identifiers and the corresponding logical volume names.
Additionally, the number of copies is stored for each logical volume.
Physical partition map

The most important data structure is the physical partition map. It maps each logical
partition to a physical partition. The size of the physical partition map is determined at
volume group creation time.
Instructor Guide
Instructor notes:
Purpose — Describe the contents of the VGDA.
Details — Use the student notes to guide your explanation. The students do not need to
know the detailed structure of the VGDA, this is just to reinforce the concepts of the type of
information maintained in the VGDA, and that the time stamps help identify a VGDA copy
that is out of date.
Additional information — The -d flag of the mkvg command is ignored in AIX 5L V5.2,
AIX 5L V5.3, and AIX 6.1.
Transition statement — Let’s have a look into the VGDA.

V7.0.1
Instructor Guide
Uempty
VGDA example
IBM Power Systems
# lqueryvg -p hdisk1 -At

Max LVs: 256
PP Size: 20 1: ____________
Free PPs: 12216

LV count: 3 2: ____________
PV count: 1 3: ____________
Total VGDAs: 2 4: ____________
MAX PPs per PV: 32768

MAX PVs: 1024
5: ____________
Logical:
00c35ba000004c00000001157fcf6bdf.1 lv00 1
00c35ba000004c00000001157fcf6bdf.2 lv01 1
00c35ba000004c00000001157fcf6bdf.3 lv02 1
Physical: 00c35ba07fcf6b93 2 0
6: ____________ 7: ____________
Figure 7-14. VGDA example AN152.2
Notes:
The lqueryvg command

The lqueryvg command is a low-level command that shows an extract from the VGDA
on a specified disk, for example, hdisk1.
In the command shown on the visual, -p hdisk1 will read the VGDA on hdisk1, -A will
display all available information, and -t will display descriptive tags.
The visual only shows selected fields from the report; a more complete example output
is below in these notes.
Interpreting lqueryvg output

As an exercise in interpreting the output of lqueryvg, match each of the following
expressions to the appropriate numbered location on the visual.
a. VGDA count on this disk
Instructor Guide
b. 2 VGDAs in volume group

c. 3 logical volumes in volume group
d. PP size = 220 (2 to the 20th power) bytes, or 1 MB (for this volume group)
e. LVIDs (VGID.minor_number)
f. 1 physical volume in volume group
g. PVIDs
Output of lqueryvg on AIX 7.1

The output of lqueryvg on recent AIX versions gives more information than shown in
the example on the visual. An example of lqueryvg (for the rootvg disk) output from an
AIX 7.1 system is given below:
Max LVs: 256
PP Size: 24
Free PPs: 512
LV count: 12
PV count: 2
Total VGDAs: 3
Conc Allowed: 0
MAX PPs per PV 1016
MAX PVs: 32
Quorum (disk): 1
Quorum (dd): 1
Auto Varyon ?: 1
Conc Autovaryo 0
Varied on Conc 0
Logical: 00f6060300004c000000012d097cb46a.1 hd5 1
00f6060300004c000000012d097cb46a.2 hd6 1
00f6060300004c000000012d097cb46a.3 hd8 1
00f6060300004c000000012d097cb46a.4 hd4 1
00f6060300004c000000012d097cb46a.5 hd2 1
00f6060300004c000000012d097cb46a.6 hd9var 1
00f6060300004c000000012d097cb46a.7 hd3 1
00f6060300004c000000012d097cb46a.8 hd1 1
00f6060300004c000000012d097cb46a.9 hd10opt 1
00f6060300004c000000012d097cb46a.10 hd11admin 1
00f6060300004c000000012d097cb46a.11 lg_dumplv 1
00f6060300004c000000012d097cb46a.12 livedump 1
Physical: 000bf81121b8ef00 2 0
00f606036452e4f9 1 0
Total PPs: 1022
LTG size: 128

V7.0.1
Instructor Guide
Uempty HOT SPARE: 0

AUTO SYNC: 0
VG PERMISSION: 0
SNAPSHOT VG: 0
IS_PRIMARY VG: 0
PSNFSTPP: 4352
VARYON MODE: 0
VG Type: 0
Max PPs: 32512
Mirror Pool St n
Sys Mgt Mode: 0
VG Reserved: 1
PV RESTRICTION 0
Instructor Guide
Instructor notes:
Purpose — Examine a VGDA.
Details — Implement this page as a sort of activity. Give the students 10 minutes to order
the expressions. Then review the page:
1. PP size = 220 (2 to the 20th power) bytes, or 1 MB (for this volume group)
2. 3 logical volumes in volume group
3. 1 physical volume in volume group
4. 2 VGDAs in volume group
5. LVIDs (VGID.minor_number)
6. PVIDs
7. VGDA count on this disk
Additional information — The lqueryvg command displays the PP size as the value of
the exponent in the power of 2 expression specifying the number of bytes in a PP. In the
example on the visual, the value of 20 given for PP Size means that the PP size is 220
bytes, which is the same as saying 1 MB. In the AIX 7.1 example in the student notes, the
value of 24 shown for PP Size means that the PP size is 224 bytes, which is 16 MB.
The best resource for information about intermediate-level LVM commands such as
lqueryvg, lvm_query, and getlvcb is the IBM Redbook AIX Logical Volume Manager from
A to Z: Troubleshooting and Commands (SG24-5433-00).
The output of lqueryvg might vary a bit, depending on the version of AIX. The notes
include an example of output from this command from an AIX 7.1 system.
Another command that can be used to examine the VGDA is readvgda or readvgda_svg if
you want to read the VGDA for a scalable volume group.
You might mention that this VGDA seems to belong to a scalable volume group (1024 MAX
PVs) and not a normal volume group (MAX LVs: 256, MAX PVs: 32) or a big volume group
(MAX LVs: 512, etc.)
Transition statement — Let’s look at the LVCB.

V7.0.1
Instructor Guide
Uempty
The logical volume control block

IBM Power Systems
# getlvcb -AT hd2

AIX LVCB
intrapolicy = c
copies = 1
interpolicy = m
lvid = 00c35ba000004c00000001157f54bf78.5
lvname = hd2
label = /usr
machine id = 35BA04C00
number lps = 102
relocatable = y
strict = y
stripe width = 0
stripe size in exponent = 0
type = jfs2
upperbound = 32
fs =
time created = Mon Feb 28 11:16:49 2011
time modified = Mon Feb 28 07:00:09 2011
Figure 7-15. The logical volume control block AN152.2
Notes:
The logical volume control block (LVCB) and the getlvcb command
The LVCB stores attributes of a logical volume. The getlvcb command queries an
LVCB.
Example report:
# getlvcb -AT hd2
AIX LVCB
intrapolicy = c
copies = 1
interpolicy = m
lvid = 00c35ba000004c00000001157f54bf78.5
lvname = hd2
label = /usr
machine id = 35BA04C00
Instructor Guide
number lps = 102

relocatable = y
strict = y
stripe width = 0
stripe size in exponent = 0
type = jfs2
upperbound = 32
fs =
time created = Mon Feb 28 11:16:49 2011
time modified = Mon Feb 28 07:00:09 2011
In the example, from the logical volume hd2, includes the following:
- intrapolicy, which specifies what strategy should be used for choosing physical
partitions on a physical volume. The five general strategies are edge (sometimes
called outer-edge), inner-edge, middle (sometimes called outer-middle),
inner-middle, and center (c = Center).
- copies (1 = No mirroring)
- interpolicy, which specifies the number of physical volumes to extend across (m =
Minimum).
- lvid
- lvname - Logical volume name (hd2)
- number lps - Number of logical partitions (102)
- Can the partitions be reorganized? (relocatable = y)
- Each mirror copy on a separate disk (strict = y)
- Number of disks involved in striping (stripe width)
- Stripe size (stripe size in exponent)
- Logical volume type (type = jfs)
- JFS file system information (fs=)
- Creation and last update time (time created, time modified)

V7.0.1
Instructor Guide

Purpose — Describe the LVCB.
Details — Explain that the LVCB stores logical volume attributes. Do not explain each
attribute shown; just do a short overview of the LVCB.
Additional information — If your logical volume interpolicy is set to maximum, the
getlvcb command will show interpolicy = x.
The values for intrapolicy are:
ie inner edge
im inner middle
c center
m outer middle
e outer edge
Transition statement — Let’s identify how LVM uses the ODM and the VGDA/LVCB.
Elluminate.
play.
Instructor Guide
How LVM interacts with the ODM and the VGDA

IBM Power Systems
importvg
ODM
VGDA and
LVCB
Change, using Match IDs by /etc/filesystems
low-level name
commands
mkvg
extendvg
mklv Update
crfs exportvg
chfs
rmlv
reducevg
...
Figure 7-16. How LVM interacts with the ODM and the VGDA AN152.2
Notes:
High-level commands
Most of the LVM commands that are used when working with volume groups, physical,
or logical volumes are high-level commands. These high-level commands (like mkvg,
extendvg, mklv, and others listed on the visual) are implemented as shell scripts and
use names to reference a certain LVM object. The ODM is consulted to match a name,
for example, rootvg or hdisk0, to an identifier.
Interaction with disk control blocks and the ODM

The high-level commands call intermediate or low-level commands that query or
change the disk control blocks VGDA or LVCB. Additionally, the ODM has to be
updated; for example, to add a new logical volume. The high-level commands contain
signal handlers to clean up the configuration if the program is stopped abnormally. If a
system crashes, or if high-level commands are stopped by kill -9, the system can

V7.0.1
Instructor Guide
Uempty end up in a situation where the VGDA/LVCB and the ODM are not in sync. The same
situation may occur when low-level commands are used incorrectly.
The importvg and exportvg commands

The visual shows two very important commands that are explained in detail later. The
command importvg imports a complete new volume group based on a VGDA and
LVCB on a disk. The command exportvg removes a complete volume group from the
ODM.
VGDA and LVCB corruption

The focus in this course is on situations where the ODM is corrupted and we assume
that the LVM control data (for example, the VGDA or the LVCB) are correct. If an
attempted execution of LVM commands (for example: lsvg, varyonvg, reducevg)
results in a failure with core dump, that could be an indication that the LVM control data
on one of the disks has become corrupted. In this situation, do not attempt to resync
the ODM using the procedures covered. In most cases, you will need to recover from
a volume group backup. If recovery from backup is not a viable option, It is suggested
that you work with AIX Support in dealing with the problem. Attempting to use the
procedures covered in this unit will not solve the problem. Even worse, you will likely
propagate the corruption to other disks in the volume group, thus making the situation
even worse.
Instructor Guide
Instructor notes:
Purpose — Explain how LVM interacts with ODM and VGDA/LVCB.
Details — Use the student notes to guide your explanation.
Additional information — The commands exportvg/importvg are covered later in this
course. Therefore, just mention briefly what these commands do.
Transition statement — Let’s see how the LVM-related device ODM objects look. This is
important, because you will have to repair ODM entries in the next part of the exercise we
started earlier. We will start with the entries that store information about physical volumes.

V7.0.1
Instructor Guide
Uempty
ODM entries for physical volumes (1 of 4)

IBM Power Systems
# odmget -q "name like hdisk[02]" CuDv
CuDv:
name = "hdisk0"
status = 1
chgstatus = 2
ddins = "scsidisk"
location = ""
parent = "vscsi0"
connwhere = "810000000000"
PdDvLn = "disk/vscsi/vdisk"
CuDv:
name = "hdisk2"
status = 1
chgstatus = 0
ddins = "scdisk"
location = "01-08-01-8,0"
parent = "scsi1"
connwhere = "8,0"
Figure 7-17. ODM entries for physical volumes (1 of 4) AN152.2
Notes:
CuDV entries for physical volumes

The CuDv object class contains information about each physical volume.
Example report:
# odmget -q "name like hdisk[02]" CuDv
CuDv:
name = "hdisk0"
status = 1
chgstatus = 2
ddins = "scsidisk"
location = ""
parent = "vscsi0"
connwhere = "810000000000"
Instructor Guide
PdDvLn = "disk/vscsi/vdisk"
CuDv:
name = "hdisk2"
status = 1
chgstatus = 0
ddins = "scdisk"
location = "01-08-01-8,0"
parent = "scsi1"
connwhere = "8,0"
Key attributes
Remember the most important attributes:
- status = 1 means the disk is available
- chgstatus = 2 means the status has not changed since last reboot
- location specifies the location code of the device
- parent specifies the parent device
Physical versus virtual disks

The two disks have different device drivers and different Predefined Device object class
links. This is because hdisk2 is a physical disk which has been directly allocated to the
logical partition (which this example came from), while hdisk0 is a virtual disk which is
mapped though the Advanced Power Virtualization feature to a backing physical disk
which is allocated to a Virtual I/O Server partition on the same machine.
The virtual disk does not have an AIX location code. Rather, its location is the physical
location code of its parent virtual SCSI adapter (vscsi0) supplemented with the LUN
number for the backing device which is recorded in the connwhere field. The physical
location code of the parent adapter is recorded in the CuVPD object for the adapter.

V7.0.1
Instructor Guide

Purpose — Explain that information about all disks is stored in the CuDv object class.
Transition statement — Let’s look at CuAt.
Instructor Guide

IBM Power Systems
# odmget -q "name=hdisk1" CuAt | egrep -p "pvid|unique_id"
CuAt:
name = "hdisk1"
attribute = "unique_id"
value = "3321360050768019102C0F000000000006E2A04214503IBMfcp"
type = "R"
generic = "D"
rep = "nl"
nls_index = 79
CuAt:
name = "hdisk1"
attribute = "pvid"
value = "00f606036452e56a0000000000000000"
type = "R"
generic = "D"
rep = "s"
nls_index = 2
Notes:
The pvid attribute

The disk’s most important attribute is its PVID.
The PVID has a length of 32 bytes, where the last 16 bytes are set to zeros in the ODM.
Whenever you must manually update a PVID in the ODM, you must specify the
complete 32-byte PVID of the disk.
If the pvid attribute is usually assigned the first time the disk is added to a volume
group.
The unique_id attribute

When working with a disk that is a LUN accessed by way of the Storage Area Network,
The unique_id is often an important identifier.

V7.0.1
Instructor Guide
Uempty Example report:

# odmget -q "name=hdisk1" CuAt | egrep -p "pvid|unique_id"
CuAt:
name = "hdisk1"
attribute = "unique_id"
value = "3321360050768019102C0F000000000006E2A04214503IBMfcp"
type = "R"
generic = "D"
rep = "nl"
nls_index = 79
CuAt:
name = "hdisk1"
attribute = "pvid"
value = "00f606036452e56a0000000000000000"
type = "R"
generic = "D"
rep = "s"
nls_index = 2
Other information stored in CuAt

Other attributes of physical volumes (for example, the reserve_policy or the
queue_depth) may be stored in CuAt.
Other methods of displaying attribute information stored in CuAt

In the visual, a grep was used to pick out the stanzas for the objects of interest. The
odmget query restriction allows multiple descriptors:
# odmget -q “name=hdisk1 and attribute=pvid” CuAt
# odmget -q “name=hdisk1 and attribute=unique_id” CuAt
The easier way to normally obtain attribute information is to use the lsattr command:
# lsattr -E -l hdisk1 -a pvid
# lsattr -E -l hdisk1 -a unique_id
Instructor Guide
Instructor notes:
Purpose — Explain that the PVID is stored in CuAt.
Additional information — A previous version of the course specified to use the
chdev command with pv=yes, in order to create a missing physical volume. If a physical
volume in a volume group is missing, the actual recommended method of recovery is to:
1. Varyoff the volume group with varyoffvg
2. Export the volume group with exportvg
3. Remove the disk with rmdev the disk
4. Run cfgmgr
5. Import the volume group with importvg
Transition statement — Let’s look at the ODM information for a Fire Channel attached
LUN.

V7.0.1
Instructor Guide
Uempty

IBM Power Systems
# odmget -q "name=hdisk1" CuDv
CuDv:
name = "hdisk1"
status = 1
chgstatus = 2
ddins = "scsidisk"
location = "31-T1-01"
parent = "fscsi0"
connwhere = "W_0"
PdDvLn = "disk/fcp/mpioosdisk"
# lsattr -El hdisk1 | egrep "ww_name|lun_id"
lun_id 0x1000000000000 Logical Unit Number ID FALSE

ww_name 0x500507680140581e FC World Wide Name FALSE
# lscfg -l hdisk1
hdisk1 U8233.E8B.100603P-V16-C31-T1-W500507680140581E-
L1000000000000 MPIO IBM 2145 FC Disk
Notes:
Discussion:
For Fibre Channel accessed LUNs, the location field would identify the parent FC
adapter; the connwhere would have a place holder value of W_0, which indicates that the
disk identify is stored in the ww_name attribute of the disk.
The physical location code is comprised of the location code of the parent adapter,
followed by the ww_name and the LUN ID (obtained from the lun_id attribute of the
disk).
Example reports:
# odmget -q "name=hdisk1" CuDv
CuDv:
name = "hdisk1"
Instructor Guide
status = 1
chgstatus = 2
ddins = "scsidisk"
location = "31-T1-01"
parent = "fscsi0"
connwhere = "W_0"
PdDvLn = "disk/fcp/mpioosdisk“
# lsattr -El hdisk1 | egrep "ww_name|lun_id“
lun_id 0x1000000000000 Logical Unit Number ID FALSE

ww_name 0x500507680140581e FC World Wide Name FALSE
# lscfg -l hdisk1
hdisk1
U8233.E8B.100603P-V16-C31-T1-W500507680140581E-L1000000000000 MPIO
IBM 2145 FC Disk

V7.0.1
Instructor Guide

Purpose — Explain how ODM entries for fibre channel accessed LUNs differ from SCSI
attached disks.
Details —
Transition statement — Let’s look at CuDvDr.
Instructor Guide

IBM Power Systems
# odmget -q "value3 like hdisk[03]" CuDvDr

CuDvDr:
resource = "devno"
value1 = "17"
value2 = "0"
value3 = "hdisk0"
CuDvDr:
resource = "devno"
value1 = "36"
value2 = "0"
value3 = "hdisk3"
# ls -l /dev/hdisk[03]
brw------- 1 root system 17, 0 Oct 08 06:17 /dev/hdisk0
brw------- 1 root system 36, 0 Oct 08 09:19 /dev/hdisk3
Notes:
Major and minor numbers

The ODM class CuDvDr is used to store the major and minor numbers of the devices.
The output shown on the visual, for example, indicates that CuDvDr has stored the
major number 17 (value1) and the minor number 0 (value2) for hdisk0.
The major numbers for the two disks are different because hdisk0 is a virtual disk,
served from a Virtual I/O Server partition, while hdisk1 is a physical disk allocated to
this logical partition.
Special files
Applications or system programs use the special files to access a certain device. For
example, the visual shows special files used to access hdisk0 (/dev/hdisk0) and
hdisk1 (/dev/hdisk1).

V7.0.1
Instructor Guide

Purpose — Explain that major and minor numbers are stored in CuDvDr.
Details — Explain that this ODM class is used to build the special files in /dev.
If it seems appropriate for the particular group of students you are teaching, you might
provide the major number (22) and minor number (1) for hdisk0 (as given in the student
notes) and then ask the students what the major number (22) and minor number (2) are for
hdisk1.
Transition statement — Let’s see how volume group information is stored in the ODM.
Instructor Guide
ODM entries for volume groups (1 of 2)

IBM Power Systems
# odmget -q "name=rootvg" CuDv

CuDv:
name = "rootvg"
status = 0
chgstatus = 1
ddins = ""
location = ""
parent = ""
connwhere = ""
PdDvLn = "logical_volume/vgsubclass/vgtype“
# odmget -q "name=rootvg" CuAt

CuAt:
name = "rootvg"
attribute = "vgserial_id"
value = "00c35ba000004c00000001157f54bf78"
type = "R"
generic = "D"
rep = "n"
nls_index = 637
Output continued on next visual
Figure 7-21. ODM entries for volume groups (1 of 2) AN152.2
Notes:
CuDv entries for volume groups

Information indicating the existence of a volume group is stored in CuDv, which means
all volume groups must have an object in this class. The visual shows an example of a
CuDv entry for rootvg.
VGID
One of the most important pieces of information about a volume group is the VGID. As
shown on the visual, this information is stored in CuAt.
Disks belonging to a volume group

An entry for each disk that belongs to a volume group is stored in CuAt. That is shown
on the next page.

V7.0.1
Instructor Guide

Purpose — Describe how volume group information is stored in CuDv and CuAt.
Transition statement — The CuAt output continues on the next page.
Instructor Guide
ODM entries for volume groups (2 of 2)

IBM Power Systems
# odmget -q "name=rootvg" CuAt

...
CuAt:
name = "rootvg"
attribute = "timestamp"
value = "470a1bc9243ed693"
type = "R"
generic = "DU"
rep = "s"
nls_index = 0
CuAt:
name = "rootvg"
attribute = "pv"
value = "00c35ba07b2e24f00000000000000000"
type = "R"
generic = ""
rep = "sl"
nls_index = 0
Figure 7-22. ODM entries for volume groups (2 of 2) AN152.2
Notes:
Disks belonging to a volume group

The CuAt object class contains an object for each disk that belongs to a volume group.
The visual shows an example of a CuAt object for a disk in rootvg.
Length of PVID
Remember that the PVID is a 32-number field, where the last 16 numbers are set to
zeros.

V7.0.1
Instructor Guide

Purpose — Describe additional objects for volume groups in CuAt.
Emphasize that PVIDs for disks are stored with a length of 32 bytes.
Ensure the students understand that a CuAt object is created for each disk in a volume
group. For example, if there were two physical volumes in rootvg, there would be two
entries with name = "rootvg" and attribute = "pv" in CuAt.
Transition statement — Let’s consider logical volumes.
Instructor Guide
ODM entries for logical volumes (1 of 2)

IBM Power Systems
# odmget -q "name=hd2" CuDv

CuDv:
name = "hd2"
status = 0
chgstatus = 1
ddins = ""
location = ""
parent = "rootvg"
connwhere = ""
PdDvLn = "logical_volume/lvsubclass/lvtype"
# odmget -q "name=hd2" CuAt Other attributes include intra,

CuAt: stripe_width, type, and so on.
name = "hd2"
attribute = "lvserial_id"
value = "00c35ba000004c00000001157f54bf78.5"
type = "R"
generic = "D"
rep = "n"
nls_index = 648
Figure 7-23. ODM entries for logical volumes (1 of 2) AN152.2
Notes:
CuDv entries for logical volumes

The CuDv object class contains an entry for each logical volume.
Attributes of a logical volume

Attributes of a logical volume, for example, its LVID (lvserial_id), are stored in the
object class CuAt. Other attributes that belong to a logical volume are the intra-physical
policy (intra), stripe_width, type, size, and label.

V7.0.1
Instructor Guide

Purpose — Explain how logical volume data is stored in the ODM.
Additional information — Remind the students that the LVID is created from the VGID
and the minor number of the special file entry of the logical volume.
Transition statement — The CuDvDr and CuDep object classes also contain logical
volume data.
Instructor Guide
ODM entries for logical volumes (2 of 2)

IBM Power Systems
# odmget -q "value3=hd2" CuDvDr

CuDvDr:
resource = "devno"
value1 = "10"
value2 = "5"
value3 = "hd2"
# ls -l /dev/hd2
brw------- 1 root system 10,5 08 Jan 06:56 /dev/hd2
# odmget -q "dependency=hd2" CuDep

CuDep:
name = "rootvg"
dependency = "hd2"
Figure 7-24. ODM entries for logical volumes (2 of 2) AN152.2
Notes:
CuDvDr logical volume objects
Each logical volume has an object in CuDvDr that is used to create the special file entry
for that logical volume in /dev. As an example, the sample output on the visual shows
the CuDvDr object for hd2 and the corresponding /dev/hd2 (major number 10, minor
number 5) special file entry in the /dev directory.
CuDep logical volume entries

The ODM class CuDep (customized dependencies) stores dependency information for
software devices. For example, the sample output on the visual indicates that the
logical volume hd2 is contained in the rootvg volume group.

V7.0.1
Instructor Guide

Purpose — Continue the explanation of where logical volume data is stored in the ODM.
Details — Explain logical volume objects in CuDvDr and CuDep.
Transition statement — Let’s look at some LVM metadata related problems.
Instructor Guide

V7.0.1
Instructor Guide
Uempty 7.4. LVM metadata related problems

What students will do — Learn how LVM metadata problem can develop and the
procedures for correcting them.
How students will do it — Lecture and discussion, followed by exercises.
What students will learn — How LVM metadata problem can develop and the procedures
for correcting them.
How this will help students on their job — These procedures can be used to fix LVM
metadata problems that the students may experience in their jobs.
Instructor Guide
ODM-related LVM problems

IBM Power Systems
2.
VGDA High-level commands ODM

LVCB
1. - Signal handler
- Lock
What can cause problems ?

kill -9, shutdown, system crash
Improper use of low-level commands
Hardware changes without or with wrong software actions
Full root file system
Figure 7-25. ODM-related LVM problems AN152.2
Notes:
Normal functioning of high-level commands

As already mentioned, most of the time administrators use high-level commands to
create or update volume groups or logical volumes. These commands use signal
handlers to set up a proper cleanup in case of an interruption. Additionally, LVM
commands use a locking mechanism to block other commands while a change is in
progress.
Causes of problems
The signal handlers used by high-level LVM commands do not work with a kill -9, a
system shutdown, or a system crash. You might end up in a situation where the VGDA
has been updated, but the change has not been stored in the ODM.
Problems might also occur because of the improper use of low-level commands or
hardware changes that are not followed by correct administrator actions.

V7.0.1
Instructor Guide
Uempty Another common problem is ODM corruption when performing LVM operations when
the root file system (which contains /etc/objrepos) is full. Always check the root file
system free space before attempting LVM recovery operations.
Instructor Guide
Instructor notes:
Purpose — Explain how ODM-related problems might come up.
Details — Explain the student material.
Transition statement — Let’s identify ways that ODM problems can be fixed.

V7.0.1
Instructor Guide
Uempty
Fixing ODM problems (1 of 2)

IBM Power Systems
If the ODM problem is not in the rootvg, for example in

volume group homevg, do the following:
# varyoffvg homevg
# exportvg homevg Remove complete volume

group from the ODM
# importvg -y homevg hdiskX
Import volume group and

create new ODM objects
Figure 7-26. Fixing ODM problems (1 of 2) AN152.2
Notes:
Determining which volume group has the problem

If you detect ODM problems, you must determine whether the volume group with the
problem is the rootvg or not. Because the rootvg cannot be varied off, the procedure
given here applies only to non-rootvg volume groups.
Steps in ODM repair procedure (for problem not in rootvg)

1. In the first step, you vary off the volume group, which requires that all file systems be
unmounted first. To vary off a volume group, use the varyoffvg command.
2. In the next step, you export the volume group by using the exportvg command. This
command removes the complete volume group from the ODM. The VGDA and
LVCB are not touched by exportvg.
Instructor Guide
3. In the last step, you import the volume group by using the importvg command.
Specify the volume group name with option -y, otherwise AIX creates a new volume
group name.
You need to specify only one intact physical volume of the volume group that you
import. The importvg command reads the VGDA and LVCB on that disk and
creates completely new ODM objects.
It should be noted that this procedure does not allow the data to be used while repairing
the corruption, even if the file systems are mounted and are accessible despite the
problem. The logical volumes must be closed to vary the volume group offline.

V7.0.1
Instructor Guide

Purpose — Describe how to fix ODM problems in non-rootvg volume groups.
Transition statement — Let’s discuss how to fix ODM problems in rootvg.
Instructor Guide
Fixing ODM problems (2 of 2)

IBM Power Systems
If the ODM problem is in the rootvg, try using the rvgrecover procedure:
PV=hdisk0
VG=rootvg
cp /etc/objrepos/CuAt /etc/objrepos/CuAt.$$
cp /etc/objrepos/CuDep /etc/objrepos/CuDep.$$
cp /etc/objrepos/CuDv /etc/objrepos/CuDv.$$
cp /etc/objrepos/CuDvDr /etc/objrepos/CuDvDr.$$
lqueryvg -Lp $PV | awk '{print $2}' | while read LVname;

do
odmdelete -q "name=$LVname" -o CuAt
odmdelete -q "name=$LVname" -o CuDv
odmdelete -q "value3=$LVname" -o CuDvDr
Done
odmdelete -q "name=$VG" -o CuAt Uses odmdelete
odmdelete -q "parent=$VG" -o CuDv to export rootvg
odmdelete -q "name=$VG" -o CuDv
odmdelete -q "name=$VG" -o CuDep
odmdelete -q "dependency=$VG" -o CuDep Uses importvg to
odmdelete -q "value1=10" -o CuDvDr import rootvg
odmdelete -q "value3=$VG" -o CuDvDr
importvg -y $VG $PV # ignore lvaryoffvg errors

varyonvg $VG
Figure 7-27. Fixing ODM problems (2 of 2) AN152.2
Notes:
Problems in rootvg
For ODM problems in rootvg, finding a solution is more difficult because rootvg cannot
be varied off or exported. However, it may be possible to fix the problem using one of
the techniques described below.
The rvgrecover procedure

If you detect ODM problems in rootvg, you can try using the procedure called
rvgrecover. You may want to code this in a script (shown on the visual) in /bin and
make it executable.
The rvgrecover procedure removes all ODM entries that belong to your rootvg by
using odmdelete. That is the same way exportvg works.

V7.0.1
Instructor Guide
Uempty After deleting all ODM objects from rootvg, it imports the rootvg by reading the VGDA
and LVCB from the boot disk. This results in completely new ODM objects that describe
your rootvg.
RAM disk maintenance mode

With the rootvg, the corruption problem may prevent a normal boot to multiuser mode.
Thus, you may need to handle this situation in RAM Disk Maintenance Mode (boot into
Maintenance mode from the CD-ROM or NIM). Before attempting this, you should make
sure you have a current mksysb backup.
Use the steps in the following table (which are similar to those in the rvgrecover script
shown on the visual) to recover the rootvg volume group after booting to maintenance
mode and file system mounting.
Step Action
Delete all of the ODM information about logical volumes.
Get the list of logical volumes from the VGDA of the physical volume.
# lqueryvg -p hdisk0 -L | awk '{print $2}' \
| while read LVname; do
1
> odmdelete -q “name=$LVname” -o CuAt
> odmdelete -q “name=$LVname” -o CuDv
> odmdelete -q “value3=$LVname” -o CuDvDr
> done
Delete the volume group information from ODM.
# odmdelete -q “name=rootvg” -o CuAt
# odmdelete -q “parent=rootvg” -o CuDv
# odmdelete -q “name=rootvg” -o CuDv
2
# odmdelete -q “name=rootvg” -o CuDep
# odmdelete -q “dependency=rootvg” -o CuDep
# odmdelete -q “value1=10” -o CuDvDr
# odmdelete -q “value3=rootvg” -o CuDvDr
Add the volume group associated with the physical volume back to the
3 ODM.
# importvg -y rootvg hdisk0
Recreate the device configuration database in the ODM from the
4 information on the physical volume.
# varyonvg -f rootvg
This assumes that hdisk0 is part of rootvg.
In CuDvDr:
value1 = major number
value2 = minor number
value3 = object name for major/minor number
Instructor Guide
rootvg always has value1 = 10.

The steps can also be used to recover other volume groups by substituting the
appropriate physical volume and volume group information. It is suggested that this
example be made a script.

V7.0.1
Instructor Guide

Purpose — Describe how to fix ODM problems in rootvg by using the rvgrecover script
and other techniques.
Breaking down the script into sections as is described in the table in the student notes is a
good way to help them understand what is being done.
Ensure students understand that they do not need to reboot in maintenance mode to fix
non-rootvg inconsistencies. Remind them of the importance of backing up rootvg (if
possible) before attempting repair on rootvg.
Additional information — The man page entries (and corresponding entries in the AIX 7.1
Commands Reference) for redefinevg and synclvodm are brief but helpful.
Instructor Guide
Intermediate level ODM commands

IBM Power Systems
High level LVM commands may not be a viable option

ODM corruption prevents high level commands from running
varyoffvg and exportvg will disrupt availability
• redefinevg –d <hdisk#> <vgname>

Identifies and reenters physical volume data for the volume group in
the ODM
Checks for inconsistencies between LVM data areas and ODM
Recovers some, but not all of the logical volume data
• synclvodm <vgname>
Synchronizes the VGDA, LVCB, ODM, and special device files
Volume group must be active
First run the redefinevg command if ODM does not have the
minimum required information about the volume group
Figure 7-28. Intermediate level ODM commands AN152.2
Notes:
Overview
There are situations where you are unable to run the exportvg or importvg commands
because they depend on finding a minimal level of information in the ODM. Even if
these high level LVM commands can be run, they require that the volume group be
taken offline, which would be disruptive. In these situations it is useful to know some
intermediate level LVM commands. These commands are primarily intended to be used
by high level ODM commands, but they can be useful in solving tough problems.
The synclvodm command

Syntax: synclvodm <VG> [<LV> ...]
Use of the synclvodm command is yet another way that you might be able to fix ODM
problems in rootvg. If, for some reason, the ODM is not consistent with on-disk
information, the synclvodm command can be used to resynchronize the database. It
synchronizes or rebuilds the LVCB, the ODM, and the VGDAs. The volume group must

V7.0.1
Instructor Guide
Uempty be active for the resynchronization to occur. If logical volume names are specified, only
the information related to those logical volumes is updated.
The synclvodm command, by itself, can do a fairly complete job of resynchronizing the
ODM with the LVM data areas on the disk. It will also synchronize the information
between the LVM data areas. As such, it can worsen a situation where only one disk in
the volume group has corrupted data areas. The command can be restricted to
synchronizing only specific logical volumes. Otherwise, it synchronizes all logical
volumes. The synclvodm command depends upon a minimal amount of information in
the ODM; most importantly, the ODM needs to know the volume group name plus the
physical volume and logical volume memberships.
The redefinevg command

The redefinevg command redefines the set of physical volumes of the given volume
group in the device configuration database. If inconsistencies occur between the
physical volume information in the ODM and the on-disk metadata, the redefinevg
command determines which physical volumes belong to the specified volume group
and re-enters this information in the ODM. The redefinevg command checks for
inconsistencies by reading the reserved areas of all the configured physical volumes
attached to the system.
It is sometimes necessary to run the redefinevg command to obtain the minimum
information about the volume group. It will create new ODM objects for the provided
volume group name and it will use the LVM data areas in the specified disk to obtain the
correct LVM information. The redefinevg command is not designed to fully rebuild all
of the logical volume information. Thus, after running the redefinevg command, it is
often necessary to run the synclvodm command to obtain the rest of the logical volume
information.
These commands can be run with the volume group still on-line.The ODM corruption
may prevent any attempt to vary them offline.
Using chdev for the PVID

The chdev command accepts an attribute of pvid=clear (to delete the PVID) and
pvid=yes (to create a pvid). While this can be useful in some circumstances, it is
generally recommended that any problem in the ODM be resolved by setting its value to
match what is stored on the disk. For example, the exportvg and importvg commands
could be used.
If there is no PVID set, either on the disk or in the ODM, then the PVID is normally
established when that disk becomes a member of a volume group (mkvg, extendvg).
Instructor Guide
Instructor notes:
Purpose — Explain the use of LVM intermediate level commands.
Details — Note that there is an optional part of the exercise where they explore the use of
these intermediate level commands.
Transition statement — Checkpoint.

V7.0.1
Instructor Guide
Uempty
Checkpoint
IBM Power Systems
1. True or false: All LVM information is stored in the ODM.
2. True or false: You detect that a physical volume hdisk1 that is

contained in your rootvg is missing in the ODM. This problem can
be fixed by exporting and importing rootvg.
Notes:
Instructor Guide
Instructor notes:
Purpose — Discuss the checkpoint questions.
IBM Power Systems
1.True or false: All LVM information is stored in the ODM.

The answer is false: Information is also stored in other AIX files and in
disk control blocks (like the VGDA and LVCB).
2.True or false: You detect that a physical volume hdisk1 that

is contained in your rootvg is missing in the ODM. This
problem can be fixed by exporting and importing rootvg.
The answer is false: Use the rvgrecover procedure instead. This
script creates a complete set of new rootvg ODM entries.

Transition statement — Let’s move on to an exercise.

V7.0.1
Instructor Guide
Uempty
Exercise: LVM metadata and related problems

IBM Power Systems
Export and import a volume group

Analyze import messages
Fix LVM ODM problems using exportvg and
importvg
Fix LVM ODM problems using rvgrecover
Use intermediate LVM commands
Manually fix an LVM ODM problem (optional)
Figure 7-30. Exercise: LVM metadata and related problems AN152.2
Notes:
Instructor Guide
Instructor notes:
Purpose — Transition to the lab.
Details — Explain the goals of this part of the exercise. If the students executed the first
two parts of the lab after the first two unit topics, then they would now continue with part
three of the exercise.
Transition statement — Let’s finish with a brief summary of what we have discussed in
this unit.

V7.0.1
Instructor Guide
Uempty
Unit summary
IBM Power Systems

Explain where LVM metadata information is stored
Use importvg and exportvg to manage LVM metadata
Solve ODM-related LVM problems
Notes:
Discussion:
The LVM information is held in a number of different places on the disk, including the
ODM and the VGDA.
ODM-related problems can be solved by:
• exportvg and importvg (non-rootvg volume groups)
• rvgrecover (rootvg)
• LVM intermediate commands
• Manually fixing using ODM commands.
Instructor Guide
Instructor notes:
Purpose — Summarize key points from the unit.

V7.0.1
Instructor Guide
Uempty Unit 8. Disk management procedures
Estimated time
00:25 Topic 1
00:50 Topic 2

This unit describes different disk management procedures:
• Managing quorum with mirrored logical volumes
• Disk replacement procedures
• Procedures to solve problems caused by an incorrect disk
replacement

• Manage volume group quorum issues
• Explain the physical volume states used by the LVM
• Replace a disk under different circumstances
• Recover from a total volume group failure

Accountability:
• Lab exercises
References
management
GG24-4484 AIX Storage Management (Redbook)
© Copyright IBM Corp. 2009, 2012 Unit 8. Disk management procedures 8-1
Instructor Guide
SG24-5432 AIX Logical Volume Manager from A to Z: Introduction

and Concepts (Redbook)
SG24-5433 AIX Logical Volume Manager from A to Z:
Troubleshooting and Commands (Redbook)

V7.0.1
Instructor Guide
Uempty
Unit objectives
IBM Power Systems

Manage volume group quorum issues
Explain the physical volume states used by the LVM
Replace a disk under different circumstances
Recover from a total volume group failure
Notes:
Instructor Guide
Instructor notes:
Details —
Transition statement — Let's start with a discussion of mirroring and quorum issues.

V7.0.1
Instructor Guide
Uempty 8.1. Failed disks: Mirroring and quorum issues

What students will do — The students will review how mirroring helps with failed disks,
learn what the term quorum means, how it relates to situations where a volume group is
mirrored, and what physical volume states are defined by LVM.
• How mirrored volume groups handle failed disks situations
• What stale partitions are
• How the quorum mechanism works
• What physical volume states are defined by LVM
How this will help students on their job — By learning about these advanced topics,
students will be able to increase the availability or the performance of AIX systems.
Instructor Guide
Mirroring
IBM Power Systems
hdisk0 Logical partitions
hdisk1
Mirrored logical volume

hdisk2
LP: PP1: PP2: PP3:

VGSA
5 hdisk0, 5 hdisk1, 8 hdisk2, 9
Figure 8-2. Mirroring AN152.2
Notes:
Using mirroring to increase availability

The visual above shows a mirrored logical volume, where each logical partition is
mirrored to three physical partitions. In this example, each of the physical partitions
which are related to a given logical partition is on a separate physical volume. More
than three copies are not possible.
If one of the disks fails, there are at least two copies of the data available. That means
mirroring is used to increase the availability of a system or a logical volume.
Role of VGSA
The information about the mirrored partitions is stored in the VGSA, which is contained
on each disk. In the example shown on the visual, we see that logical partition 5 points
to physical partition 5 on hdisk0, physical partition 8 on hdisk1, and physical partition 9
on hdisk2.

V7.0.1
Instructor Guide

Purpose — Review the concept of mirroring.
Details — This is a review of concepts that were covered in the prerequisite course.
Mirroring is being discussed to support the discussion of failed disks and why we turn off
quorum checking. Use the student notes to guide your explanation.
Transition statement — Let’s describe what stale partitions are.
Instructor Guide
Stale partitions
IBM Power Systems
hdisk0
Mirrored
hdisk1 logical
volume
hdisk2 Stale partition
After the repair of hdisk2:

varyonvg VGName (calls syncvg -v VGName)
Only stale partitions are updated
Figure 8-3. Stale partitions AN152.2
Notes:
How data becomes stale

If a disk that contains a mirrored logical volume (such as hdisk2 on the visual) fails, the
data on the failed disk becomes stale (not current, not up-to-date).
How state information is kept

State information (active or stale) is kept for each physical partition. A physical
volume is shown as stale (which can be seen with the command lsvg VGName), as
long as it has one stale partition.

V7.0.1
Instructor Guide
Uempty Updating stale partitions

If a disk with stale partitions has been repaired (for example, after a power failure), you
should issue the varyonvg command which starts the syncvg command to synchronize
the stale partitions. The syncvg command is started as a background job that updates
all stale partitions from the volume group.
Always use the varyonvg command to update stale partitions. After a power failure, a
disk forgets its reservation. The syncvg command cannot reestablish the reservation,
whereas varyonvg does this before calling syncvg. The term reservation means that a
disk is reserved for one system. The disk driver puts the disk in a state where you can
work with the disk and at the same time the control LED of the disk turns on.
The varyonvg command works if the volume group is already varied on or if the volume
group is the rootvg.
Instructor Guide
Instructor notes:
Purpose — Explain what stale partitions are.
Details — Explain using the information in the student notes. The prerequisite course
discusses stale partitions as a stage in the creation of mirroring. Remind them that this can
also happen as a result of disk failure. Once a disk is recovered, the syncvg command will
need to be run to resynchronize the copies.
Additional information — Explain that using varyonvg is better than using syncvg
directly. The varyonvg command works if the volume group is already varied on and if the
volume group is the rootvg.
Transition statement — Let’s see how mirrored logical volumes can be created.

V7.0.1
Instructor Guide
Uempty
Mirroring rootvg
IBM Power Systems
hd9var mirrorvg hd9var

hd8 hd8
hd5 hd5
... ...
hd1 hd1
hdisk0 hdisk1
1. bootinfo –B hdisk1 4. bosboot –a

2. extendvg 5. bootlist
3. mirrorvg -m 6. bootinfo -b
Make a copy of all rootvg logical volumes using

mirrorvg and place copies on the second disk
Execute bosboot and change your bootlist
Figure 8-4. Mirroring rootvg AN152.2
Notes:
Reason to mirror rootvg

What is the reason to mirror the rootvg?
If your rootvg is on one disk, you get a single point of failure; which means, if this disk
fails, your machine is no longer available.
If you mirror rootvg to a second (or third) disk, and one disk fails, there will be another
disk that contains the mirrored rootvg. You increase the availability of your system.
Procedure for mirroring rootvg

The following steps show how to mirror the rootvg.
a. Select a disk for the mirror copies. It needs to be large enough to hold these copies,
plus enough room to handle future growth.
# bootinfo -s hdisk1
Instructor Guide
Check that the disk is actually bootable, since it will hold the alternate boot logical
volume.
# bootinfo -B hdisk1
Any returned value other than a value of 1, indicates the disk is not bootable.
b. If not already part of the rootvg, add the new disk to the volume group (for example,
hdisk1):
# extendvg [ -f ] rootvg hdisk1
c. Use the mirrorvg command to mirror all of the logical volumes in the rootvg to the
new disk. The mirrorvg command, by default, will disable quorum and mirror the
existing logical volumes in the specified volume group. Changes to the volume
group quorum attribute is effective immediately without having to vary off and then
vary on the volume group. It will, by default, also synchronize the copies; though,
you may suppress synchronization by using the -s flag. It is recommended that you
use the exact mapping option (-m) to ensure that the mirror copy of the boot logical
volume (hd5) is allocated contiguous physical partitions. To mirror rootvg, use the
command:
# mirrorvg -m rootvg hdisk1
Restrictions:
• You cannot use the mirrorvg command on a snapshot volume group
• You cannot use the mirrorvg command on a volume group that has an active
firmware assisted dump logical volume
• You cannot use the mirrorvg command if ALL of the following conditions exist:
- The target system is a logical partition (LPAR)
- A copy of the boot logical volume (by default, hd5) resides on the failed
physical volume
- The replacement physical volume's adapter was dynamically configured into
the LPAR since the last cold boot
An alternative to running mirrorvg is to separately execute the component tasks:
• If you use one mirror disk, be sure that a quorum is not required for vary on:
# chvg -Qn rootvg
• Add the mirrors for all rootvg logical volumes:
# mklvcopy hd1 2 hdisk1

V7.0.1
Instructor Guide
Uempty # mklvcopy hd8 2 hdisk1

# mklvcopy hd9var 2 hdisk1
# mklvcopy hd10opt 2 hdisk1
# mklvcopy hd11admin 2 hdisk1
• If you have other logical volumes in your rootvg, be sure to create copies for
them as well.
• Now, synchronize the new copies you created:
# syncvg -v rootvg
d. As we want to be able to boot from the different disks, we need to use bosboot:
# bosboot -a
As hd5 is mirrored, there is no need to do it for each disk.
e. Update the bootlist. In case of a disk failure, we must be able to boot from different
disks.
# bootlist -m service hdisk1 hdisk0
f. Check that the system boots from the first boot disk.
# bootinfo -b
Instructor Guide
Instructor notes:
Purpose — Review how to mirror rootvg.
Details — Again, this is review from the prerequisite course. Use the information in the
student material to guide your presentation.
Additional information — Mirroring of the paging space and dump logical volumes:
When mirroring rootvg, hd6 should be mirrored because the paging space availability is
critical to keeping the system online. hd6 serves both as paging space and as the default
dump device. In AIX V 4.3.3 and subsequent releases, there is no problem with mirroring
dump devices.
In releases prior to 4.3.3, dump devices did not work correctly if mirrored. On these older
releases, a separate dump device should be created and not mirrored.
Before 4.3.3, if the dump device was mirrored, when the dump occurred, the data would be
written to one copy of the mirror. Even though only one copy was updated, no partitions
would be marked stale. When the machine rebooted, the dump data would attempt to move
the data from hd6 and write it to /var/adm/ras (by default). Since LVM would think the
mirror was in sync, it would read the data from all copies of hd6 causing the dump to
become corrupted. In AIX4.3.3, the intermediate command (readlvcopy), was provided
that allowed one to specify to only read from the primary copy even though the policy was
parallel. Dump processing (snap reading of dump logical volume) was re-coded to use
readlvcopy. At that point mirroring of a dump logical volume could be supported.
But there was another problem. Sometimes with a mirrored dump logical volume, the dump
would not be reported. This was fixed in AIX 5.2 TL08 (or later) and in AIX 5.3 TL04 (or
later).
Thus a mirrored dump logical volume is currently supported and the mirrorvg command
automatically mirrors the paging space, even when it is also acting as the dump logical
volume.
On the other hand, mirroring the dump logical volume is not recommended, due to the
resulting performance impact when creating the dump and some surmountable but
irritating complications in reading the dump. Because of this recommendation, the
mirrorvg command will not mirror the dump logical volume if the there is a separate logical
volume for the dump (not using the paging space).
In order to protect against the scenario of the disk holding the dump logical volume being
unavailable (when not mirrored) at the time of the dump, the recommendation is that you
should define a secondary dump device on a different disk than the primary dump device.
It is good to have a separate logical volume for the dump, instead of using the paging
space logical volume. By having a separate dump logical volume, it separates the dump
logical volume issues from the paging space issues. For example: it is definitely desirable
to mirror the paging space logical volume, while it is recommended that you do not mirror
the dump logical volume.

V7.0.1
Instructor Guide
Uempty Regarding mirroring during mksysb backups and restores, the official advisory
recommendation is that the mirroring of the rootvg be broken. Be certain that the
remaining disk is the same disk as the last disk used to boot (bootinfo -b).
Transition statement — Let’s show another way to mirror the rootvg.
Instructor Guide
VGDA count
IBM Power Systems
Two-disk volume group
Loss of PV1: Only 33% VGDAs available

(No quorum)
Loss of PV2: 66% of VGDAs available

PV1 PV2 (Quorum)
Three-disk volume group

Loss of 1 PV: 66% of VGDAs still available
(Quorum)
PV1 PV2 PV3
Figure 8-5. VGDA count AN152.2
Notes:
Reservation of space for VGDAs

Each disk that is contained in a volume group contains at least one VGDA. The LVM
always reserves space for two VGDAs on each disk.
Volume groups containing two disks

If a volume group consists of two disks, one disk contains two VGDAs, the other disk
contains only one (as shown on the visual). If the disk with the two VGDAs fails, we
have only 33% of VGDAs available, that means we have less than 50% of VGDAs. In
this case, the quorum which means that more than 50% of VGDAs must be available, is
not fulfilled.
Volume groups containing more than two disks
If a volume group consists of more than two disks, each disk contains one VGDA. If one
disk fails, we still have 66% of VGDAs available and the quorum is fulfilled.

V7.0.1
Instructor Guide

Purpose — Describe how VGDAs are stored on disks in a volume group and how these
VGDAs are involved in determining whether quorum exists.
Details — Use the information in the student material to guide your presentation.
Transition statement — Let’s discuss what happens if a quorum is not available.
Instructor Guide
Quorum not available

IBM Power Systems
datavg
Two VGDAs One VGDA
hdisk1 hdisk2
If hdisk1 fails, datavg has no quorum
VG
ctive ac
n ot a tiv
e
VG
# varyonvg datavg Closed during operation:
No more access to logical volumes
FAILS LVM_SA_QUORCLOSE in error log
Figure 8-6. Quorum not available AN152.2
Notes:
Introduction
What happens if quorum checking is enabled for a volume group and a quorum is not
available?
Consider the following example (illustrated on the visual and discussed in the following
paragraphs): In a two-disk volume group datavg, the disk hdisk1 is not available due to
a hardware defect. hdisk1 is the disk that contains the two VGDAs; that means the
volume group does not have a quorum of VGDAs.
Result if volume group not varied on

If the volume group is not varied on and the administrator tries to vary on datavg, the
varyonvg command will fail.

V7.0.1
Instructor Guide
Uempty Volume group already varied on

If the volume group is already varied on when quorum is lost, the LVM will deactivate
the volume group. There is no access to any logical volume that is part of this volume
group. At this point, the system sometimes shows strange behavior. This situation is
posted to the error log, which shows an error entry LVM_SA_QUORCLOSE. After losing the
quorum, the volume group may still be listed as active (as seen with the lsvg -o
command), however, all application data access and LVM functions requiring data
access to the volume group will fail. The volume group is dropped from the active list as
soon as the last logical volume is closed. If you use the commands,
fuser -k /dev/LVname or umount /dev/LVname, no data is actually written to the disk.
Instructor Guide
Instructor notes:
Purpose — Describe the quorum mechanism.
Details — Describe what happens when the quorum is not available. Make sure they
understand the difference between quorum checking of an active volume group and the
quorum mechanisms involved with trying to vary on an inactive volume group.
Additional information — Some of this discussion applies to rootvg. However, there are
some differences, as we will see later.
Transition statement — Let’s describe how to set up non-quorum volume groups.

V7.0.1
Instructor Guide
Uempty
Nonquorum volume groups

IBM Power Systems
With single mirroring, always disable the quorum:

chvg -Qn datavg
Additional considerations for rootvg:

chvg -Qn rootvg
bosboot -ad /dev/hdiskX
Turning off the quorum checking:

Requires 100% VGDAs for normal varyonvg
Allows the volume group to stay active if quorum is lost
Figure 8-7. Nonquorum volume groups AN152.2
Notes:
Loss of quorum in a nonquorum volume group

When a nonquorum volume group loses its quorum it will not be deactivated, it will be
active until it loses all of its physical volumes.
Recommendations when using single mirroring

When working with single mirroring, always disable quorum checking using the
command chvg -Qn (and VGname as the argument). In AIX 6 and later, the change in
quorum checking is effective immediately. In older versions of AIX you will need to vary
off and vary on the volume group to make the change effective.
Note that the mirrorvg command now automatically disables quorum checking for a
mirrored volume group.
Instructor Guide
Recommendations for rootvg

When turning off the quorum checking for rootvg, you must do a bosboot (or a
savebase), to reflect the change in the ODM in the boot logical volume. In versions of
AIX prior to AIX 6, you need to then reboot the machine, in order to make the change
effective (occurs at varyonvg).
Varying on a nonquorum volume group

It is important that you know that turning off the quorum checking does not allow a
varyonvg without a quorum. It just prevents the closing of an active volume group when
losing its quorum.

V7.0.1
Instructor Guide

Purpose — Describe nonquorum volume groups.
Details — Cover the material in the student notes.
Transition statement — What can you do if a varyonvg fails?
Instructor Guide
Forced vary on (varyonvg -f)

IBM Power Systems
datavg
Two VGDAs One VGDA
ed"
"remov hdisk1 hdisk2
# varyonvg datavg Fails (even when quorum is disabled)

Check the reason for the failure (cable, adapter, power)
before doing the following:
# varyonvg -f datavg
Failure accessing hdisk1. Set PV STATE to removed.
Volume group datavg is varied on.
Figure 8-8. Forced vary on (varyonvg -f) AN152.2
Notes:
When normal vary on may fail

If the quorum of VGDAs is not available during vary on, the varyonvg command fails,
even when quorum is disabled. In fact, when quorum is disabled, the varyonvg
command requires that 100% of the VGDAs be available instead of 51%.
Doing a forced vary on

Before doing a forced vary on (varyonvg -f), always check the reason of the failure. If
the physical volume appears to be permanently damaged, use a forced varyonvg.
All physical volumes that are missing during this forced vary on will be changed to
physical volume state removed. This means that all the VGDA and VGSA copies will be
removed from these physical volumes. Once this is done, these physical volumes will
no longer take part in quorum checking, nor will they be allowed to become active within
the volume group until you return them to the volume group.

V7.0.1
Instructor Guide
Uempty Change in VGDA distribution

In the example on the visual, the active disk hdisk2 becomes the disk with the two
VGDAs. This does not change, even if the failed disk can be brought back.
Quorum checking on
With quorum checking on, you always need > 50% of the VGDAs available (except to
vary on rootvg).
Quorum checking off

With quorum checking off, you have to make a distinction between an already active
volume group and varying on a volume group.
An active volume group will be kept open as long as there is at least one VGDA
available.
Set MISSINGPV_VARYON=true in /etc/environment if a volume group needs to be
varied on with missing disks at boot time.
When using varyonvg -f or using MISSINGPV_VARYON=true, you take full responsibility
for the volume group integrity.
Instructor Guide
Instructor notes:
Purpose — Describe forced vary on of a volume group.
Transition statement — Let’s discuss what’s meant by physical volume state.

V7.0.1
Instructor Guide
Uempty
Physical volume states

IBM Power Systems
varyonvg VGName
active
Q
m losuoru
o ru ? t? m
k
Qu o
missing missing
varyonvg -f VGName
Hardware
repair
removed
Hardware repair
followed by:
varyonvg VGName
chpv -v a hdiskX
removed
Figure 8-9. Physical volume states AN152.2
Notes:
Introduction
This page introduces physical volume states (not device states). Physical volume states
can be displayed with lsvg -p VGName.
Active state
If a disk can be accessed during a varyonvg, it gets a physical volume state of active.
Missing state
If a disk can not be accessed during a varyonvg, but quorum is available, the failing
disk gets a physical volume state missing. If the disk can be repaired, for example,
after a power failure, you just have to issue a varyonvg VGName to bring the disk into the
active state again. Any stale partitions will be synchronized.
Instructor Guide
Removed state
If a disk cannot be accessed during a varyonvg and the quorum of disks is not
available, you can issue the command, varyonvg -f VGName, and force the vary on of
the volume group.
The failing disk gets a physical volume state of removed, and it will not be used for
quorum checks any longer.
Recovery after repair

If you are able to repair the disk (for example, after a power failure), executing a
varyonvg alone does not bring the disk back into the active state. It maintains the
removed state.
At this stage, you have to announce the fact that the failure is over by using the
following command:
# chpv -va hdiskX
This defines the disk hdiskX as active.
Note that you have to do a varyonvg VGName afterwards to synchronize any stale
partitions.
The chpv -r command

The opposite of chpv -va is chpv -vr which brings the disk into the removed state. This
works only when all logical volumes have been closed on the disk that will be defined as
removed. Additionally, chpv -vr does not work when the quorum will be lost in the
volume group after removing the disk.

V7.0.1
Instructor Guide

Purpose — Introduce physical volume states.
Details — Use the student notes to guide your presentation. Distinguish between physical
volume states and device states.
Transition statement — Let us next examine different techniques for handling disk
replacement.
Instructor Guide

V7.0.1
Instructor Guide
Uempty 8.2. Disk replacement techniques

What students will do — The students will identify how to replace a disk under different
conditions.
• Identify how to replace a disk under different conditions
• Recover from a total volume group failure
How this will help students on their job — Replacing a disk is not always an easy job.
System administrators must know the procedures to replace a disk without corrupting the
systems.
Instructor Guide
Disk replacement: Starting point

IBM Power Systems
A disk must be replaced ...
Yes
Disk mirrored? Procedure 1
No
Yes
Disk still working? Procedure 2
No
Volume group No
Procedure 3
lost?
Not rootvg
rootvg
Yes
Procedure 4 Procedure 5
Figure 8-10. Disk replacement: Starting point AN152.2
Notes:
Reasons to replace a disk

Many reasons might require the replacement of a disk, for example:
- Disk is too small
- Disk is too slow
- Disk produces many DISK_ERR4 error log entries
Flowchart
Before starting the disk replacement, always follow the flowchart that is shown in the
visual. This will help you whenever you have to replace a disk.
1. If the disk that must be replaced is completely mirrored onto another disk, follow
procedure 1
2. If a disk is not mirrored, but still works, follow procedure 2

V7.0.1
Instructor Guide
Uempty 3. If you are absolutely sure that a disk failed and you are not able to repair the
disk, do the following:
- If the volume group can be varied on (normal or forced), use procedure 3
- If the volume group is totally lost after the disk failure, that means the volume
group could not be varied on (either normal or forced)
• If the volume group is rootvg, follow procedure 4
• If the volume group is not rootvg, follow procedure 5
Instructor Guide
Instructor notes:
Purpose — Provide considerations before a disk replacement.
Details — Explain as described in the student material.
Additional information — This flowchart is a method to offer disk replacement procedures
for many types of disk failures. It is not guaranteed that 100% of all disk failures are
covered.
A good way to distinguish between the various procedures is to focus on where we recover
the data from:
1. Procedure 1 - We synchronize from a remaining good mirror copy
2. Procedure 2 - We migrate the data off the suspect disk to the new disk before removing
the suspect disk
3. Procedure 3 - We recover the data from the file system backups (or logical volume
backup provided by the using application)
4. Procedure 4 - We recover using the mksysb backup of the rootvg
5. Procedure 5 - We recover using the savevg backup for the non-rootvg
Transition statement — Let’s start with procedure 1

V7.0.1
Instructor Guide
Uempty
Procedure 1 (1 of 4): Disk mirrored

IBM Power Systems
The replacepv command simplifies procedure 1
Use of replacepv has restrictions:

Not rootvg
Snapshot volume group mechanism not being used
Replacement physical volume at least as large as failed physical
volume
Both physical volumes can be on system at the same time
Otherwise, use a variation without the replacepv command
Figure 8-11. Procedure 1 (1 of 4): Disk mirrored AN152.2
Notes:
When to use this procedure

Use Procedure 1 when the disk that must be replaced is mirrored.
Disk state
This procedure requires that the disk state of the failed disk be either missing or
removed. Refer to Physical Volume States (covered earlier in this unit) for more
information on disk states. Use the command, lspv hdiskX, to check the state of your
physical volume. If the disk is still in the active state, you cannot remove any copies or
logical volumes from the failing disk. In this case, one way to bring the disk into a
removed or missing state is to run the reducevg -d command or to do a varyoffvg
and a varyonvg on the volume group by rebooting the system.
Instructor Guide
Alternative approaches
The two main alternatives for this procedure are to use the replacepv command or to
not use that command. The replacepv command greatly simplifies the procedure.
The restrictions are:
- The volume group can not be rootvg.
- The snapshot volume group mechanism must not be in use.
- The replacement physical volume must be at least as large as failed physical
volume.
- Both physical volumes can be on the system at the same time. In other words, you
cannot remove the failed disk and then place the new disk in the same position.

V7.0.1
Instructor Guide

Purpose — Explain the options when working in a mirrored environment.
Details —
Transition statement — Let us first look at the use of the replacepv command.
Instructor Guide
Procedure 1 (2 of 4): Disk mirrored with

replacepv
IBM Power Systems
1. Provide a replacement disk
2. If a new disk, discover and configure:

# cfgmgr
Disk discovered as: hdiskY Mirrored
3. Run replacepv:
# replacepv hdiskX hdiskY
4. Remove the failed disk from the ODM:

# rmdev -l hdiskX -d
Figure 8-12. Procedure 1 (2 of 4): Disk mirrored with replacepv . AN152.2
Notes:
The replacepv command greatly simplifies the procedure.
1) Provide a replacement disk. It may already be an unused disk, already known
to AIX. Otherwise, you need to provide a new disk. There are many ways to
provide a disk that is new to AIX:
• Directly allocate a PCI storage adapter to the LPAR. If the adapter does
not already have an available PCI under it, it will need to be provided
through a hot add (if a local disk) or by zoning a LUN (if it’s a Fibre
Channel adapter)
• Use PowerVM to provision a virtual SCSI disk.
2) Discover the new disk by executing the cfgmgr command.
3) Execute the replacepv to allocate physical partitions on the replacement disk
for the problem disk. Effectively the new disk replaces the failing disk in the
mirroring configuration. In the example, hdiskX is the failing disk.
4) Remove the failing disk.

V7.0.1
Instructor Guide

Purpose — Explain procedure to use the replacepv command.
Details —
Transition statement — What if you cannot use the replacepv command?
Instructor Guide
Procedure 1 (3 of 4): Disk mirrored without

replacepv
IBM Power Systems
1. Remove all copies from the disk:

# unmirrorvg vg_name hdiskX
2. Remove the disk from the volume group:

# reducevg vg_name hdiskX
Mirrored
3. Remove the disk from the ODM:
4. Provide a replacement disk.

If a new disk, discover and configure
# cfgmgr
5. Add the new disk to the volume group:

# extendvg vg_name hdiskY
6. Create new copies:

# mirrorvg vg_name hdiskY
Figure 8-13. Procedure 1 (3 of 4): Disk mirrored without replacepv . AN152.2
Notes:
The goal of each disk replacement is to remove all logical volumes from a disk.
1. Start removing all logical volume copies from the disk. Use either the SMIT
fastpath smit unmirrorvg or the unmirrorvg command as shown in the visual.
This will unmirror each logical volume that is mirrored on the disk.
If you have additional unmirrored logical volumes on the disk, you have to either
move them to another disk (migratepv), or remove them if the disk cannot be
accessed (rmlv).
2. If the disk is completely empty, remove the disk from the volume group. Use
SMIT fastpath smit reducevg or the reducevg command.
3. After the disk has been removed from the volume group, you can remove it from
the ODM. Use the rmdev command as shown in the visual.
4. Use a hot-swap procedure to replace the failed or failing disk. (In older
machines, disk replacement would effectively require the system to be shutdown
for the procedure). Execute cfgmgr to discover and configure the new disk.

V7.0.1
Instructor Guide
Uempty 5. Add the new disk to the volume group. Use either the SMIT fastpath
smit extendvg or the extendvg command.
6. Finally, create new copies for each logical volume on the new disk. Use either
the SMIT fastpath smit mirrorvg or the mirrorvg command. If synchronization
was suppressed during mirroring, then remember to eventually synchronize the
volume group (or each logical volume), using the syncvg command.
Instructor Guide
Instructor notes:
Purpose — Explain Procedure 1, without use of the replacepv command.
Details —
Additional information — When you read the student notes you might think that removing
a logical volume from a disk that fails is not possible. The important thing is: it is possible,
but it requires the disk to be either in a missing or removed state. If the disk is active, the
LVM does not allow you to unmount a file system or remove a logical volume from the
failing disk.
Now the problem is: how do you bring a disk into the missing or removed state? The
answer is that you have to do a reducevg -d or to force a new varyonvg, either in a normal
or a forced mode. Because you cannot do a varyoffvg when file systems are mounted
(and you cannot unmount them from the failing disk), the only way to recover from this bad
situation is to reboot your system. This might cause other problems if the failing disk is in
rootvg and the quorum has not been disabled in a two-disk volume group.
Transition statement — Let’s describe additional considerations for when the volume
group is the rootvg.

V7.0.1
Instructor Guide
Uempty
Procedure 1 (4 of 4): Special steps for rootvg

IBM Power Systems
Prior to the reducevg step:

1. Remove the failed disk from the bootlist:
# bootlist –m normal hdisk1
2. Ensure primary dump logical volume is on the good disk:

# mklv -t sysdump -y dump rootvg 64 hdisk0
# sysdumpdev –P –p /dev/dump
• mirrorvg step and after:

1. Exact mapping for mirrorvg:
# mirrorvg –m rootvg hdiskX
2. Rebuild the boot image:

# bosboot –a
3. Add the new disk to the boot list:

# bootlist –m normal hdisk1 hdiskX
Figure 8-14. Procedure 1 (4 of 4): Special steps for rootvg . AN152.2
Notes:
Special steps for rootvg

The rootvg has special considerations, because it contains the boot logical volume and
the dump device.
The new disk has to replace the old disk in the bootlist.
The bootlist needs to be rebuilt to include the replacement disk instead of the bad disk.
The main reason for exact mirroring is to be sure that the boot logical volume has
contiguous allocations.
If a dedicated dump device is being used, it is common for it to not be mirrored. If the
dump logical volume is on the failing disk, then it should be redefined on the good disk
instead.
Instructor Guide
Instructor notes:
Purpose — Explain special considerations for rootvg situations.
Details —
Transition statement — Let’s describe procedure 2.

V7.0.1
Instructor Guide
Uempty
Procedure 2 (1 of 2): Disk still working

IBM Power Systems
1. Connect the new disk to the system.

Volume group
2. Add new disk to volume group:
# extendvg vg_name hdiskY
3. Migrate old disk to new disk: (*) hdiskY

# migratepv hdiskX hdiskY
4. Remove old disk from volume group:

# reducevg vg_name hdiskX
5. Remove old disk from ODM:

(*) : Is the disk in rootvg?

See next visual for further considerations
Figure 8-15. Procedure 2 (1 of 2): Disk still working AN152.2
Notes:

Procedure 2 applies to a disk replacement where the disk is unmirrored but could be
accessed. If the disk that must be replaced is in rootvg, follow the instructions on the
next visual.
The goal and how to do it

The goal is the same as always. Before we can replace a disk, we must remove
everything from the disk.
1. Shut down your system if you need to physically attach a new disk to the system.
Boot the system so that cfgmgr will configure the new disk.
2. Add the new disk to the volume group. Use either the SMIT fastpath
smit extendvg or the extendvg command.
Instructor Guide
3. Before executing the next step, it is necessary to distinguish between the rootvg
and a non-rootvg volume group.
- If the disk that is replaced is in rootvg, execute the steps that are shown on
the next visual Procedure 2: Special Steps for rootvg.
- If the disk that is replaced is not in the rootvg, use the migratepv command:
# migratepv hdisk_old hdisk_new
This command moves all logical volumes from one disk to another. You can
do this during normal system activity. The command migratepv requires that
the disks are in the same volume group.
4. If the old disk has been completely migrated, remove it from the volume group.
Use either the SMIT fastpath smit reducevg or the reducevg command.
5. If you need to remove the disk from the system, remove it from the ODM using
the rmdev command as shown. Finally, remove the physical disk from the
system.

V7.0.1
Instructor Guide

Purpose — Explain procedure 2.
Details — Describe the procedure as explained in the student material.
Additional information — Make it clear to the students that step 3 is different for rootvg.
Transition statement — Let’s describe the special considerations for rootvg.
Instructor Guide
Procedure 2 (2 of 2): Special steps for rootvg

IBM Power Systems
rootvg 1
hdiskX 2
hdiskY
3. Disk contains hd5?
# migratepv -l hd5 hdiskX hdiskY
1. Connect new disk to system # bosboot -ad /dev/hdiskY
# chpv -c hdiskX
2. Add new disk to volume # bootlist -m normal hdiskY
group
Migrate old disk to new disk:
3.
4. Remove old disk from
volume group 4
5. Remove old disk from ODM 5
Figure 8-16. Procedure 2 (2 of 2): Special steps for rootvg . AN152.2
Notes:
Additional steps for rootvg

Procedure 2 requires some additional steps if the disk that must be replaced is in
rootvg.
1. Connect the new disk to the system as described in procedure 2.
2. Add the new disk to the volume group. Use smit extendvg or the extendvg
command.
3. This step requires special considerations for rootvg:
- Check whether your disk contains the boot logical volume. The default
location for the boot logical volume is /dev/hd5.
Use the command, lspv -l, to check the logical volumes on the disk that
must be replaced.

V7.0.1
Instructor Guide
Uempty If the disk contains the boot logical volume, migrate the logical volume to the
new disk and update the boot logical volume on the new disk. To avoid a
potential boot from the old disk, clear the old boot record by using the
chpv -c command. Then, change your bootlist:
# migratepv -l hd5 hdiskX hdiskY
# bosboot -ad /dev/hdiskY
# chpv -c hdiskX
# bootlist -m normal hdiskY
If the disk contains the primary dump device, you must deactivate the dump
before migrating the corresponding logical volume:
# sysdumpdev -p /dev/sysdumpnull
- Migrate the complete old disk to the new one:
If the primary dump device has been deactivated, you have to activate it
again:
# sysdumpdev -p /dev/hdX
4. After the disk has been migrated, remove it from the rootvg volume group.
# reducevg rootvg hdiskX
5. If the disk must be removed from the system, remove it from the ODM (use the
rmdev command), shut down your AIX, and remove the disk from the system
afterwards.
Instructor Guide
Instructor notes:
Purpose — Describe the special considerations for rootvg.
Details — Describe as provided in the student material.

V7.0.1
Instructor Guide
Uempty
Procedure 3: Disk in missing or removed state

IBM Power Systems
1. Identify all logical volumes and file systems on failing disk:

Volume group
# lspv -l hdiskY
2. Unmount all file systems on failing disk:
# umount /dev/lv_name hdiskX hdiskY
3. Remove all file systems from failing disk:
# rmfs filesystem
4. Remove all logical volumes from failing disk:
# rmlv logical-volume
5. Remove disk from volume group: # lspv hdiskY
# reducevg vg_name hdiskY ...
6. Remove disk from system: PV STATE: removed
# rmdev -l hdiskY -d
7. Add new disk to volume group: # lspv hdiskY
# extendvg vg_name hdiskZ ...
8. Recreate all logical volumes and file systems on new disk: PV STATE: missing
# mklv -y lv_name # smit crfs
9. Restore file systems from backup:
# restore -rvqf /dev/rmt0
Figure 8-17. Procedure 3: Disk in missing or removed state AN152.2
Notes:

Procedure 3 applies to a disk replacement where a disk could not be accessed but the
volume group is intact. The failing disk is either in a state (not device state) of missing
(normal varyonvg worked) or removed (forced varyonvg was necessary to bring the
volume group online).
If the failing disk is in an active state (this is not a device state), this procedure will not
work. In this case, one way to bring the disk into a removed or missing state is to run
the reducevg -d command or to do a varyoffvg and a varyonvg on the volume group
by rebooting the system. The reboot is necessary because you cannot vary off a
volume group with open logical volumes. Because the failing disk is active, there is no
way to unmount file systems.
Instructor Guide
Procedure steps
If the failing disk is in a missing or removed state, start the procedure:
1. Identify all logical volumes and file systems on the failing disk. Use commands
like lspv, lslv or lsfs to provide this information. These commands will work on
a failing disk.
2. If you have mounted file systems on logical volumes on the failing disk, you must
unmount them. Use the umount command.
3. Remove all file systems from the failing disk using smit rmfs or the rmfs
command. If you remove a file system, the corresponding logical volume and
stanza in /etc/filesystems is removed as well.
4. Remove the remaining logical volumes (those not associated with a file system)
from the failing disk using smit rmlv or the rmlv command.
5. Remove the disk from the volume group, using the reducevg command or the
SMIT fastpath smit reducevg.
6. Remove the disk from the ODM and from the system using the rmdev command.
7. Add the new disk to the system and extend your volume group. Use the SMIT
fastpath smit extendvg or the extendvg command.
8. Recreate all logical volumes and file systems that have been removed due to the
disk failure. Use smit mklv, smit crfs or the commands directly.
9. Due to the total disk failure, you lost all data on the disk. This data has to be
restored, either by the restore command or any other tool you use to restore
data (for example, Tivoli Storage Manager) from a previous backup.

V7.0.1
Instructor Guide

Purpose — Describe procedure 3.
Additional information — This procedure requires the volume group to be brought online,
either by a varyonvg or a varyonvg -f. If it is forced, the failed disk will be in a removed
state. Use lspv to analyze physical volume states. If it is a normal varyonvg, the disk will
be in a missing state.
Note that removing logical volumes is possible on a disk that could not be accessed.
Instructor Guide
Procedure 4: Total rootvg failure

IBM Power Systems
rootvg
1. Replace bad disk
hdiskX
2. Boot in maintenance mode
rootvg
3. Restore from a mksysb image
hdiskX hdiskY
4. Import each volume group into the new
ODM (importvg) if needed
Contains OS
datavg logical
volumes
hdiskZ
mksysb
Figure 8-18. Procedure 4: Total rootvg failure AN152.2
Notes:

Procedure 4 applies to a total rootvg failure.
This situation might come up when your rootvg consists of one disk that fails. Or, your
rootvg is installed on two disks and the disk fails that contains operating system logical
volumes (for example, /dev/hd4).
Procedure steps
Follow these steps:
1. Replace the bad disk
2. Boot your system in maintenance mode
3. Restore your system from a mksysb

V7.0.1
Instructor Guide
Uempty If any rootvg file systems were not mounted when the mksysb was made, those
file systems are not included on the backup image. You will need to create and
restore those as a separate step.
4. If your mksysb does not contain user volume group definitions (for example, you
created a volume group after saving your rootvg), you have to import the user
volume group after restoring the mksysb. For example:
# importvg -y datavg hdisk9
Only one disk from the volume group (in our example hdisk9), needs to be
selected.
Export and import of volume groups is discussed in more detail in the next topic.
Instructor Guide
Instructor notes:
Purpose — Explain how to recover a total rootvg failure.

V7.0.1
Instructor Guide
Uempty
Procedure 5: Total non-rootvg failure

IBM Power Systems
datavg
1. Export the volume group from the system:
# exportvg vg_name
2. Check /etc/filesystems hdiskX
3. Remove bad disk from ODM and the system:

4. Connect the new disk

5. If volume group backup is available (savevg):
Tape
# restvg -f /dev/rmt0 hdiskY
6. If no volume group backup is available: Recreate ...

- Volume group (mkvg) hdiskY
- Logical volumes and file systems (mklv, crfs)
7. Restore data from a backup:

# restore -rqvf /dev/rmt0
Figure 8-19. Procedure 5: Total non-rootvg failure AN152.2
Notes:

Procedure 5 applies to a total failure of a non-rootvg volume group. This situation might
come up if your volume group consists of only one disk that fails. Before starting this
procedure, make sure this is not just a temporary disk failure (for example, a power
failure).
Procedure steps
Follow these steps:
1. To fix this problem, export the volume group from the system. Use the command
exportvg as shown. During the export of the volume group, all ODM objects that
are related to the volume group will be deleted.
2. Check your /etc/filesystems. There should be no references to logical volumes
or file systems from the exported volume group.
Instructor Guide
3. Remove the bad disk from the ODM (use rmdev as shown). Shut down your
system and remove the physical disk from the system.
4. Connect the new drive and boot the system. The cfgmgr will configure the new
disk.
5. If you have a volume group backup available (created by the savevg command),
you can restore the complete volume group with the restvg command (or the
SMIT fastpath smit restvg). All logical volumes and file systems are recovered.
If you have more than one disk that should be used during restvg, you must
specify these disks:
# restvg -f /dev/rmt0 hdiskY hdiskZ
The savevg and restvg commands will be discussed in a future chapter.
6. If you have no volume group backup available, you have to recreate everything
that was part of the volume group.
Recreate the volume group (mkvg or smit mkvg), all logical volumes (mklv or
smit mklv) and all file systems (crfs or smit crfs).
7. Finally, restore the lost data from backups, for example with the restore
command or any other tool you use to restore data in your environment.

V7.0.1
Instructor Guide

Purpose — Explain procedure 5.
Transition statement — Let’s discuss some common disk replacement failures.
Instructor Guide
ODM errors from LVM commands

IBM Power Systems
# lsvg -p datavg
ODM failure unable to find device id
...734... in device
configuration database
1. Typing error in the command ?

Analyze failure
2. Analyze the ID of the device: Which
physical volume or logical volume
causes problems?
ODM problem No
Export and import
in rootvg?
volume group
Yes
rvgrecover
Figure 8-20. ODM errors from LVM commands AN152.2
Notes:
ODM failure
After an incorrect disk replacement, you might detect ODM failures. For example, when
issuing the command lsvg -p datavg, a typical error message could be:
unable to find device id 00837734 in device configuration database
In this case, a device could not be found in the ODM.
Analyze the failure

Before trying to fix it, check the command you typed in. Maybe it just contains a typing
error.
Find out what device corresponds to the ID that is shown in the error message.

V7.0.1
Instructor Guide
Uempty Fix the ODM problem

We have already discussed two ways to fix an ODM problem:
- If the ODM problem is related to the rootvg, execute the rvgrecover procedure.
- If the ODM problem is not related to the rootvg, export the volume group with the
exportvg command and import it again with the importvg command. Export and
import will be explained in more detail in the next topic.
Instructor Guide
Instructor notes:
Purpose — Provide how to fix ODM failures (this is a kind of a review page).
Details —
Transition statement — Let us look at an example of a procedural error that would cause
an ODM problem.

V7.0.1
Instructor Guide
Uempty
Removal of disk without reducevg (1 of 2)

IBM Power Systems
VGDA: physical: ODM: CuAt:

...221... hdisk4 pvid= 221
...555... hdisk5 pvid= 555
datavg pv= 221
datavg pv= 555
# rmdev –l hdisk5 -d
VGDA: physical: ODM: CuAt:

...221... hdisk4 pvid= 221
...555... object deleted
datavg pv= 221
datavg pv= 555
ODM is in conflict with itself and with the VGDA
Unable to varyonvg or reducevg due to this problem
Figure 8-21. Removal of disk without reducevg (1 of 2) AN152.2
Notes:
The problem
A frequent error occurs when the administrator removes a disk from the ODM (by
executing rmdev) and physically removes the disk from the system, without first
executing the reducevg command to remove volume group references to that disk (in
the VGDA and in the ODM).
The VGDA stores information about all physical volumes of the volume group. ODM
disk references include the physical volume attributes for the volume group.
Note: Throughout this discussion the physical volume ID (PVID) is abbreviated in the
visuals for simplicity. The physical volume ID is actually 32 characters.
The result of this mistake is that the volume group can not be varied online. Attempts to
use reducevg after the fact, fail - since the command requires that the volume group be
active.
Instructor Guide
Instructor notes:
Purpose — Introduce the VGDA corruption if a disk is removed from the ODM but not from
the volume group.
Additional information — It is not possible to remove a disk from the ODM as long as it
has open logical volumes. If any process is using a logical volume from a disk, you cannot
remove the disk with rmdev.
Transition statement — Let’s describe the fix for this error.

V7.0.1
Instructor Guide
Uempty
Removal of disk without reducevg (2 of 2)

IBM Power Systems
1. Repair the ODM enough to varyonvg

Two options to repair the ODM:
Use ODM commands (such as odmdelete)
Use exportvg and importvg (using hdisk4)
2. # varyonvg datavg
Succeeds but reports errors due to VGDA reference to PVID
# lsvg –l datavg also complains:
Unable to find device id 555
555 missing
3. Use reducevg to remove disk reference from VGDA
# reducevg datavg hdisk5
Fails: unable to find physical volume hdisk5
# reducevg datavg …555…
Succeeds
Figure 8-22. Removal of disk without reducevg (2 of 2) AN152.2
Notes:
The fix
Before fixing the problem, be sure you have correctly recorded the PVID for the
removed disk. The previous lsvg listing of physical volumes for datavg would have
provided that. A previously executed lspv would also have provided the PVID.
This problem can be fixed by executing the reducevg command, but the volume group
need to be active and the varyonvg will not work while volume group has a PVID
value can not be resolved to a disk.
You could use odmdelete to remove the bad PVID attribute object, but this is not as
simple as it sounds and a mistake could make matter worse. An easier way to clean up
the bad ODM reference is exporting the volume group and then importing the volume
group using the VGDA on the remaining disk.
Once the volume group is active, we can then use reducevg to properly remove the
bad PVID reference from the VGDA. Instead of specifying the disk name, the PVID of
Instructor Guide
the removed disk is specified. If you did not earlier record the PVID, then you will need
to obtain it from the VGDA itself.
To obtain the PVID of the removed disk from the VGDA run:
# lqueryvg -p hdisk4 -At (Use any disk from the volume group)
You need to compare this with the lsvg -p datavg output to identify which PVID is for
the missing disk.

V7.0.1
Instructor Guide

Purpose — Describe how to fix this VGDA corruption.
Details —
Transition statement — Now, it’s time for some checkpoint questions.
Instructor Guide
Checkpoint
IBM Power Systems
1. Although everything seems to be working fine, you detect error

log entries for disk hdisk0 in your rootvg. The disk is not
mirrored to another disk. You decide to replace this disk. Which
procedure would you use to migrate this disk?
2. You detect an unrecoverable disk failure in volume group

datavg. This volume group consists of two disks that are
completely mirrored. Because of the disk failure you are not able
to vary on datavg. How do you recover from this situation?
3. After disk replacement, you recognize that a disk has been

removed from the system but not from the volume group. How do
you fix this problem?
Notes:

V7.0.1
Instructor Guide

Purpose — Review and test the students understanding of this unit.
IBM Power Systems
1. Although everything seems to be working fine, you detect error log

entries for disk hdisk0 in your rootvg. The disk is not mirrored to
another disk. You decide to replace this disk. Which procedure would
you use to migrate this disk?
The answer is procedure 2: Disk still working. There are some
additional steps necessary for hd5 and the primary dump device
hd6.You detect an unrecoverable disk failure in volume group datavg.
2. This volume group consists of two disks that are completely mirrored.
Because of the disk failure you are not able to vary on datavg. How do
you recover from this situation?
The answer is forced varyon: varyonvg -f datavg. Use procedure
1 for mirrored disks.
3. After disk replacement, you find that a disk has been removed from the
system but not from the volume group. How do you fix this?
The answer is repair the ODM, for example through exportvg and
importvg. Execute reducevg using the PVID instead of disk name.
Instructor Guide
Exercise: Disk management procedures

IBM Power Systems
Work with LVM mirroring and quorum
• rootvg disk replacement
User volume group disk replacement

procedure
Figure 8-24. Exercise: Disk management procedures AN152.2
Notes:

V7.0.1
Instructor Guide

Details —
Transition statement — Let’s summarize the unit.
Instructor Guide
Unit summary
IBM Power Systems

Manage volume group quorum issues
Explain the physical volume states used by the LVM
Replace a disk under different circumstances
Recover from a total volume group failure
Notes:
Discussion:
Different procedures are available that can be used to fix disk problems under any
circumstance:
Procedure 1: Mirrored disk
Procedure 2: Disk still working (rootvg specials)
Procedure 3: Total disk failure
Procedure 4: Total rootvg failure
Procedure 5: Total non-rootvg failure
exportvg and importvg can be used to easily transfer volume groups between
systems.

V7.0.1
Instructor Guide

Instructor Guide

V7.0.1
Instructor Guide
Uempty Unit 9. Install and cloning techniques
Estimated time
00:25 Topic 1
00:24 Topic 2

This unit describes techniques to reduce the size of a maintenance
window. Specific techniques are taught for installing system updates
while cloning the rootvg.

• Use alternate disk installation techniques for applying AIX
maintenance
• Use multibos to apply AIX maintenance

Accountability:
• Lab exercise
Reference
management
Online AIX Version 7.1 Installation and migration
SC24-7910 AIX Version 7.1 Differences Guide (Redbook)
© Copyright IBM Corp. 2009, 2012 Unit 9. Install and cloning techniques 9-1
Instructor Guide
SC23-6742 AIX Version 7.1 Understanding the Diagnostic

Subsystem for AIX
http://www.ibm.com/developerworks/aix/library/au-alt_disk_copy

V7.0.1
Instructor Guide
Uempty
Unit objectives
IBM Power Systems

Use alternate disk installation techniques for applying AIX
maintenance
Use multibos to apply AIX maintenance
Notes:
Instructor Guide
Instructor notes:
Purpose — List this unit’s objectives.
Details —
Additional information — In the previous unit, students learned when volume group
backups must be used after a disk failure. This unit will explain how to back up rootvg and
non-rootvg volume groups.
Transition statement — Let’s start with the mksysb command.

V7.0.1
Instructor Guide
Uempty 9.1. Alternate disk installation

What students will do — The students will identify how alternate disk installation
techniques can be used.
How students will do it — Through lecture and activity
What students will learn — Students will learn how to handle alternate disk installation
techniques.
How this will help students on their job — Being able to work with alternate disk
installation allows students to handle the installation of large facilities. Systems can be
installed over a longer period of time while the systems are still running at the same
version. The switchover can then happen at the same time.
Instructor Guide
Topic 1 objectives
IBM Power Systems
After completing this topic, you should be able to:

Install a mksysb onto an alternate disk
Clone an existing rootvg to an alternate disk
Remove an alternate disk
Figure 9-2. Topic 1 objectives AN152.2
Notes:

V7.0.1
Instructor Guide

Purpose — Cover the unit objectives.
Details —
Instructor Guide
Alternate disk installation

IBM Power Systems
# smit alt_install
# smit alt_mksysb # smit alt_clone

-OR- -OR-
# alt_disk_mksysb # alt_disk_copy
Installing a mksysb on Cloning the running

another disk rootvg to another disk
Figure 9-3. Alternate disk installation AN152.2
Notes:
Benefits of alternate disk techniques

Alternate disk installation lets you install the operating system while the system is still
up and running, which reduces installation or upgrade downtime considerably. It also
allows large facilities to better manage an upgrade because systems can be installed
over a longer period of time. While the systems are still running at the previous version,
the switch to the newer version can happen at the same time.
When to use an alternate disk techniques

Alternate disk installation can be used in one of two ways:
- Installing a mksysb image on another disk
- Cloning the current running rootvg to an alternate disk

V7.0.1
Instructor Guide
Uempty Filesets
An alternate disk installation uses the following filesets:
- bos.alt_disk_install.boot_images must be installed for alternate disk mksysb
installations
- bos.alt_disk_install.rte must be installed for rootvg cloning and alternate disk
mksysb installations
How to use alternate disk techniques

All modes of alternate disk installations are available through the SMIT fastpath:
smit alt_install.
To focus on installing a new image on an alternate disk, you can either use the SMIT
fastpath: smit alt_mksysb or directly run the command, alt_disk_mksysb.
To focus on cloning an existing mksysb to an alternate disk, you can either use the SMIT
fastpath: smit alt_clone or directly run the command, alt_disk_copy.
Instructor Guide
Instructor notes:
Purpose — Introduce alternate disk installation.
Details — Alternate disk installation has been available since AIX V4.3.
How current commands relate to pre-AIX 5L V5.3 command

Prior to AIX 5L V5.3, all alternate disk functions were invoked through a single
command: alt_disk_install.The use of alt_disk_install command is still
supported, but it now simply invokes the new replacement commands to do the actual
work.
The following three commands were added in AIX 5L V5.3:
- alt_disk_copy will create copies of rootvg on an alternate set of disks
- alt_disk_mksysb will install an existing mksysb on an alternate set of disks
- alt_rootvg_op will perform wake, sleep, and customize operations
The alt_disk_install module will continue to ship as a wrapper to the new modules.
However, it will not support any new functions, flags, or features.
The following table displays how the existing operation flags for alt_disk_install will
map to the new modules. The alt_disk_install command will now call the new
modules after printing an attention notice that it is obsolete. All other flags will apply as
currently defined.
alt_disk_install
New Commands
Command Arguments
-C args disk alt_disk_copy args -d disks
-d mksysb args disks alt_disk_mksysb -m mksysb args -d disks
-W args disk alt_rootvg_op -W args -d disk
-S args alt_rootvg_op -S args
-P2 args disks alt_rootvg_op -C args -d disks
-X args alt_rootvg_op -X args
-v args disk alt_rootvg_op -v args -d disk
-q args disk alt_rootvg_op -q args -d disk
Transition statement — Let’s discuss alternate mksysb disk installation.

V7.0.1
Instructor Guide
Uempty
Alternate mksysb disk installation (1 of 2)

IBM Power Systems
hdisk0
rootvg (AIX 5L V5.3)
hdisk1
AIX 6.1
# alt_disk_mksysb –m /dev/rmt0 –d hdisk1
Example installs an AIX 6.1 mksysb on hdisk1

Bootlist will be set to alternate disk (hdisk1)
Changing the bootlist allows you to boot different AIX levels
(hdisk0 boots AIX 5L V5.3, hdisk1 boots AIX 6.1)
Figure 9-4. Alternate mksysb disk installation (1 of 2) AN152.2
Notes:
Introduction
An alternate mksysb installation involves installing a mksysb image that has already
been created from another system onto an alternate disk of the target system. The
mksysb image must have been created on a system running AIX V4.3 or subsequent
versions of the operating system.
Example
In the example, an AIX V6.1 mksysb tape image is installed on an alternate disk, hdisk1
by executing the following command:
# alt_disk_mksysb -m /dev/rmt0 -d hdisk1
The system now contains two rootvgs on different disks. In the example, one rootvg
has an AIX 5L V5.3 (hdisk0), one has an AIX 6.1 (hdisk1).
Instructor Guide
Which disk does the system use to boot?

The alt_disk_mksysb command changes the bootlist by default. During the next
reboot, the system will boot from the new rootvg. If you do not want to change the
bootlist, use the option -B of alt_disk_mksysb.
By changing the bootlist, you determine which AIX version you want to boot.
Filesets within the mksysb being installed

The mksysb image used for the installation must be created on a system that has either
the same hardware configuration as the target system, or must have all the device and
kernel support installed for a different machine type or platform. In this case, the
following filesets must be contained in the mksysb:
- devices.*
- bos.mp
- bos.up
- bos.64bit (if necessary)
alt_disk_mksysb options
The alt_disk_mksysb command has the following options:
-m device
-d target-disks
-B : do not change the bootlist
-i image.data
-s script
-R resolve.conf
-p platform
-L mksysb_level
-n : remain a NIM client
-P phase
-c console
-r reboot after install
-k keep mksysb device customization
-y : import non-rootvg volume groups

V7.0.1
Instructor Guide

Purpose — Introduce alternate mksysb disk installation.
Details —
Transition statement — Let’s introduce the SMIT interface.
Instructor Guide
Alternate mksysb disk installation (2 of 2)

IBM Power Systems
# smit alt_mksysb
Install mksysb on an Alternate Disk

[Entry Fields]
* Target Disk(s) to install [hdisk1] +

* Device or image name [/dev/rmt0] +
Phase to execute all +
image.data file [] /
Customization script [] /
Set bootlist to boot from this disk
on next reboot? yes +
Reboot when complete? no +
Verbose output? no +
Debug output? no +
resolv.conf file [] /
Figure 9-5. Alternate mksysb disk installation (2 of 2) AN152.2
Notes:
SMIT panel example

The alternate disk install function can also be executed from the user-friendly smit
dialog panel.
When you execute smit alt_mksysb, you get the SMIT menu shown on the visual.

V7.0.1
Instructor Guide

Purpose — Describe the SMIT interface for alternate mksysb disk installation.
Details — Keep it very brief - we only want to show that this can be easily executed from a
SMIT dialog panel.
Additional information — The installation on the alternate disk is broken into three
phases:
1. Phase 1 creates the altinst_rootvg volume group, the alt_logical volumes, the
/alt_inst file systems and restores the mksysb data.
2. Phase 2 runs any specified customization script and copies a resolv.conf file, if
specified.
3. Phase 3 umounts the /alt_inst file systems, renames the file systems and logical
volumes and varies off the altinst_rootvg. It sets the bootlist and reboots, if
specified.
Each phase can be run separately. Phase 3 must be run to get a usable rootvg volume
group.
Transition statement — Let’s describe alternate disk rootvg cloning.
Instructor Guide
Alternate disk rootvg cloning (1 of 2)

IBM Power Systems
hdisk0
rootvg (AIX 7.1 TL01)
Clone
hdisk1
AIX AIX 7.1 TL02 rootvg (AIX 7.1 TL02)
# alt_disk_copy -b update_all -l /dev/cd0 -d hdisk1
Example creates a copy of the current rootvg on hdisk1

Installs a maintenance level on the clone (AIX7.1 TL02)
Changing the bootlist allows you to boot different AIX levels
(hdisk0 boots AIX 7.1 TL01, hdisk1 boots AIX 7.1 TL02)
Figure 9-6. Alternate disk rootvg cloning (1 of 2) AN152.2
Notes:
Benefits of cloning rootvg

Cloning the rootvg to an alternate disk can have many advantages. One advantage is
having an online backup available in case of a disk failure. Another benefit of rootvg
cloning is in applying new maintenance levels or updates. A copy of the rootvg is made
to an alternate disk (in our example hdisk1) followed by the installation of a
maintenance level on that copy. The active system runs uninterrupted during this time.
When the system is rebooted, it will boot from the newly updated rootvg for testing. If
the maintenance level causes problems, the old rootvg can be retrieved by simply
resetting the bootlist and rebooting.
Example
In the example, alt_disk_copy -b update_all -l /dev/cd0 -d hdisk1, rootvg
which resides on hdisk0, is cloned to the alternate disk hdisk1. Additionally, a new
maintenance level will be applied to the cloned version of AIX.

V7.0.1
Instructor Guide

Purpose — Introduce alternate disk rootvg cloning.
Details —
Additional information — The alt_disk_copy options are (see man page):
-b bundle name
-f APAR_list file
-F list_of_APARs
-l path to location of installp images
-w list_of_filesets_to_install
-d target disks
-B : do not change bootlist
-r : reboot after cloning
-s script
-P phases
-R resolv.conf
-W filesets
Transition statement — Let’s show the SMIT fastpath.
Instructor Guide
Alternate disk rootvg cloning (2 of 2)

IBM Power Systems
# smit alt_clone
Clone the rootvg to an Alternate Disk

[Entry Fields]
* Target Disk(s) to install [hdisk1] +
Phase to execute all +
image.data file [] /
Exclude list [] /
Bundle to install [update_all] +

-OR-
Fileset(s) to install []
Fix bundle to install []

-OR-
Fixes to install []
Directory or Device with images [/dev/cd0]

(required if filesets, bundles or fixes used)
...
Customization script [] /
Set bootlist to boot from this disk
on next reboot? yes +
Reboot when complete? no +
...
Figure 9-7. Alternate disk rootvg cloning (2 of 2) AN152.2
Notes:
Example with SMIT

The SMIT fastpath for alternate disk rootvg cloning is smit alt_clone.
The target disk in the example is hdisk1, that means the rootvg will be copied to that
disk. If you specify a bundle, a fileset or a fix, then the installation or the update takes
place on the clone, not in the original rootvg.
By default, the bootlist will be set to the new disk.
Changing the bootlist allows you to boot from the original rootvg or the cloned rootvg.

V7.0.1
Instructor Guide

Purpose — Describe the SMIT fastpath for alternate disk rootvg cloning.
Details — Keep it very brief.
Transition statement — Let’s show how to remove an alternate disk installation.
Instructor Guide
Removing an alternate disk installation

IBM Power Systems
Original hdisk0
rootvg (AIX 7.1 TL01)

Clone
# shutdown -Fr
# lsvg rootvg (AIX 7.1 TL02)
rootvg
altinst_rootvg
# alt_rootvg_op -X
• alt_rootvg_op -X removes
# bootlist -m normal hdisk1
the ODM definition from the ODM
# shutdown -Fr
# lsvg Do not use exportvg to
rootvg remove the alternate volume
old_rootvg group
# alt_rootvg_op –X old_rootvg
Figure 9-8. Removing an alternate disk installation AN152.2
Notes:
Removing the alternate rootvg

If you have created an alternate rootvg with alt_disk_mksysb or alt_disk_copy, but
no longer wish to use it, first boot your system from the original disk (in our example,
hdisk0) and then use alt_rootvg_op.
When executing lsvg to list the volume groups in the system, the alternate rootvg is
shown with the name altinst_rootvg.
To remove the alternate rootvg, do not use the exportvg command. Simply run the
following command:
# alt_rootvg_op -X
This command removes the altinst_rootvg definition from the ODM database.
If exportvg is run by accident, you must recreate the /etc/filesystems file before
rebooting the system. The system will not boot without a correct /etc/filesystems.

V7.0.1
Instructor Guide
Uempty Removing the original rootvg

If you have created an alternate rootvg with alt_disk_mksysb or alt_disk_copy,
and no longer wish to use the original disk, first boot your system from the cloned disk
(in our example, hdisk1) and then use the alt_rootvg_op command to remove it.
When executing lsvg to list the volume groups in the system, the alternate rootvg is
shown with the name old_rootvg.
To remove the original rootvg, do not use the exportvg command. Simply run the
following command:
# alt_rootvg_op -X old_rootvg
This command removes the old_rootvg definition from the ODM database.
If exportvg is run by accident, you must recreate the /etc/filesystems file before
rebooting the system. The system will not boot without a correct /etc/filesystems.
Instructor Guide
Instructor notes:
Purpose — Describe how to remove an alternate disk installation.
Details —
Transition statement — You may have noted that, up to this point, we only talked about
applying maintenance to an existing version and release of AIX, but not about migrating to
a new version and release. To use the alternate disk capabilities with a migration install,
you need to use NIM. Let’s look at this briefly.

V7.0.1
Instructor Guide
Uempty
NIM alternate disk migration

IBM Power Systems
• alt_disk_copy does not support migrating to a new

version or release of AIX
• nimadm uses a NIM server to migrate to an alternate disk

hdisk0
rootvg
(AIX 6.1)
Clone
NIM server NIM client: hdisk1
lpar1
rootvg
AIX AIX 7.1
(AIX 7.1)
## nimadm
nimadm -c
-c lpar1
lpar1 -s
-s spot1
spot1 -l
-l lpp1
lpp1 -d
-d "hdisk1"
"hdisk1" -Y
-Y
Figure 9-9. NIM alternate disk migration AN152.2
Notes:
What is nimadm?
The nimadm command (Network Install Manager Alternate Disk Migration) is a utility that
allows the system administrator to create a copy of rootvg to a free disk (or disks) and
simultaneously migrate it to a new version or release level of AIX. The nimadm
command uses NIM resources to perform this function.
Advantages of nimadm
There are several advantages to using the nimadm command over a conventional
migration:
- Reduced downtime. The migration is performed while the system is up and
functioning normally. There is no requirement to boot from installation media, and
the majority of processing occurs on the NIM master.
Instructor Guide
- The nimadm command facilitates quick recovery in the event of migration failure.
Since the nimadm command uses alt_disk_install to create a copy of rootvg, all
changes are performed to the copy (altinst_rootvg). In the event of serious
migration installation failure, the failed migration is cleaned up and there is no need
for the administrator to take further action. In the event of a problem with the new
(migrated) level of AIX, the system can be quickly returned to the pre-migration
operating system by booting from the original disk.
- The nimadm command allows a high degree of flexibility and customization in the
migration process. This is done with the use of optional NIM customization
resources: image_data, bosinst_data, exclude_files, pre-migration script,
installp_bundle, and post-migration script.
Details of using NIM to perform an alternate disk migration are not covered in this
course.

V7.0.1
Instructor Guide

Purpose — Introduce the use of nimadm.
Details — The intent is only to make the students aware of this NIM capability. You do not
even need to cover the displayed example of the nimadm command. Instead, refer then to
the full NIM training course. It is important that they understand that an alternate disk
migration can not be done without using NIM.
Transition statement — Let’s do a review of this section.
Instructor Guide
Exercise: Alternate disk install

IBM Power Systems
Clone the existing rootvg

Apply a new service pack
Alternate boot between different levels
Figure 9-10. Exercise: Alternate disk install AN152.2
Notes:

V7.0.1
Instructor Guide

Purpose —
Details —
Instructor Guide

V7.0.1
Instructor Guide
Uempty 9.2. Using multibos
Instructor Topic Introduction

What students will do — The students will learn how to set up and use multibos to work
with an alternate BOS.
How students will do it — Through lecture and lab exercise
What students will learn — Students will learn how to set up and use multibos to work
with an alternate BOS.
How this will help students on their job — An alternate BOS provides a tool for making
AIX system modification (such as applying a new technology level) without any effect on
the functionality of the active BOS. When the next maintenance window arrives, a quick
reboot can be used to switch over to the new technology level. If there is an unexpected
problem, a quick reboot returns the system to the prior state.
Instructor Guide
Topic 2 objectives
IBM Power Systems

Clone an active BOS to a standby BOS
Customize a standby BOS
Alternate boot between an active BOS and a standby BOS
Mount a standby BOS
Start a standby BOS shell
Notes:

V7.0.1
Instructor Guide

Purpose — Cover the objectives for the topic.
Details —
Transition statement — Let’s look at what multibos is and what it provides.
Instructor Guide
multibos overview
IBM Power Systems
Two alternate AIX base operating systems (BOS) in a single

rootvg
Standby BOS created as copy of active BOS
Modify standby BOS without affecting active BOS
Apply maintenance to standby BOS
Mount and modify standby BOS
Start interactive shell working in standby BOS
Can alternate on reboot which BOS is active
Figure 9-12. multibos overview AN152.2
Notes:
Overview
The main purpose of using multibos is to have the type of alternate BOS (base
operating system) capabilities that are available with the alternate disk technology,
without having to use another disk. The operating system filesets do not occupy enough
space to justify allocating another entire disk for that purpose. With multibos, you can
have the two BOS versions on the same disk.
This is accomplished by creating copies of the effected (by an OS update) base
operating system logical volumes (active BOS) with a different file name path. Note that
these copies are in the one and only rootvg.
Another advantage to multibos is that there is lower overhead to the cloning operation,
since it does not need to clone all the logical volumes in the rootvg.
Once you have created the alternate BOS, changes, such as applying maintenance,
can be made to these copies, without changing the level of code being used in the

V7.0.1
Instructor Guide
Uempty active BOS. In addition to applying maintenance, you can access and make
configuration changes to the standby BOS through two techniques: mounting the
standby BOS and starting an interactive shell (chroot) for the standby BOS.
When you would like to test the standby BOS, you simply reboot using the standby copy
of the boot logical volume (BLV). If there is a problem with the changes that were made,
configure the bootlist to use the original BLV and a reboot will return you to the original
version of the BOS.
Instructor Guide
Instructor notes:
Purpose — Provide an overview of multibos function and purpose.
Details —
Transition statement — Let’s first look at the file system structure of the alternate BOS,
when created.

V7.0.1
Instructor Guide
Uempty
Active and standby BOS logical volumes

IBM Power Systems
Active BOS
/
BLV jfslog (hd4)
(hd5) (hd8)
Standby BOS
home opt usr var tmp bos_inst (if mounted)
(hd1) (hd10opt) (hd2) (hd9var) (hd3) (bos_hd4)
opt usr var

(bos_hd10opt) (bos_hd2) (bos_hd9var)
BLV jfslog
(bos_hd5) (bos_hd8)
Figure 9-13. Active and standby BOS logical volumes AN152.2
Notes:
Standby BOS structure

The standby BOS needs to mimic the structure of the live BOS file system structure, yet
we do not want it to replace the active file systems. To handle this, multibos creates a
logical volume to match each of the BOS logical volumes. This includes not only the file
systems, but also the JFSlogs and the boot logical volume. The names are modified by
prepending a prefix of bos_ to the front of the standard logical volume names. For the
standby BOS file systems, the file system mount point is changed to have a root path of
/bos_inst/.
If we mount the standby BOS, then we will use this modified path (beginning with
/bos_inst). If we use the chroot shell access or if we reboot to make the standby BOS
the active BOS, then the (formally standby BOS) file systems will have a root path of /.
Instructor Guide
Instructor notes:
Purpose — Explain the structure of the standby BOS.
Details —
Transition statement — Next, we will look at how we actually create a standby BOS using
the multibos command.

V7.0.1
Instructor Guide
Uempty
Setting up a standby BOS (1 of 2)

IBM Power Systems
• multibos –s –X
Pre-validate that there is sufficient rootvg free space
Uses default image.data (can customize with –i)
Special logical volumes and file systems created for the

standby OS
bos_<lvname>
/bos_inst/<mount point>
Figure 9-14. Setting up a standby BOS (1 of 2) AN152.2
Notes:
multibos space prerequisite

Since the multibos will need sufficient space in rootvg to replicate the BOS logical
volumes, you must ensure that there is enough free space in the rootvg to
accommodate this. Display the current space used by these BOS logical volumes.
Remember that user defined logical volumes, even if in the rootvg will not be cloned.
Then check that there is enough space on the rootvg disk. Note that the clone, by
default, uses the default /image.data file. This means that the cloned logical volumes,
are placed on the same disk as the source logical volumes. If you need to obtain space
by extending the volume group, then you will need to customize the image.data file that
is used.
The creation of the standby BOS will require additional space in the active BOS during
the operation. It is recommended that you allow the multibos command to increase the
size of file systems as needed (using the -X flag).
Instructor Guide
image.data customization
If you want to change any characteristics of the cloned rootvg logical volumes or file
systems, you can create a copy of the image.data file, edit the copy, and then specify
that the multibos command should use your edited copy (by using the -i flag).
For example, if you wanted the cloned logical volumes to be placed on a disk that was
added to the rootvg, then you would first run the mkszfile command to obtain a
current capture of the characteristics, copy the created /image.data file to a different
name, and edit it to specify that the cloned logical volumes should be placed on the
additional disk. Then, you need to point to that new file by running the command:
# multibos -i <image.data copy> -Xs
Which logical volumes are cloned?

The multibos facility does not clone all the logical volumes in the rootvg, unlike the
alternate disk facility. Some of the system defined logical volumes and all user
defined logical volumes are accessed in common between the active BOS and the
standby BOS.
The logical volumes which are cloned are:
• /dev/hd5 (BLV)
• /dev/hd4 (root file system)
• /dev/hd2 (/usr)
• /dev/hd9var (/var)
• /dev/hd10opt (/opt)

V7.0.1
Instructor Guide

Purpose — Explain how to set up a standby BOS using the multibos command.
Details —
Transition statement — Let’s continue our discussion on creating a standby BOS in the
next visual.
Instructor Guide
Setting up a standby BOS (2 of 2)

IBM Power Systems
Copies BOS file systems backup and restore
Non-BOS logical volumes are shared
Optional post-creation customization script
Bootlist updated (-t will block)

1st: Standby BOS
2nd: Active BOS
Figure 9-15. Setting up a standby BOS (2 of 2) AN152.2
Notes:
Tasks of multibos standby BOS creation

The multibos command, when requested to create a standby BOS, will:
• Collect the metadata information about the rootvg
• Create and define the standby logical volumes and file systems
• Use the backup and restore commands to copy the files from the active BOS
file systems to the standby file systems
• Set the bootlist to have the standby BOS BLV first and the active BOS BLV
second
• Run a post-creation customization script, if provided by the administrator

V7.0.1
Instructor Guide

Purpose — Continue the discussion of what happens when we create a standby BOS.
Details —
Transition statement — Let’s briefly look at some of the operations that we can execute
once we have a standby BOS created, starting with customization and mounting a standby
BOS.
Instructor Guide
Other multibos operations (1 of 2)

IBM Power Systems
Customizing a standby BOS

multibos –c { -a | -b <bundle> | -f <fixlist> } –l device
Can combine with standby BOS creation
Mounting and unmounting a standby BOS

multibos –m
Mounts to /bos_inst/
multibos -u
Figure 9-16. Other multibos operations (1 of 2) AN152.2
Notes:
Customizing standby BOS

You can use the multibos customization operation, with the -c flag, to update the
standby BOS. The customization operation requires a source for the fix filesets (-l
device or directory flag) and at least one installation option (installation by bundle,
installation by fix, or update_all).
The customization operation performs the following steps:
1) The standby BOS file systems are mounted, if not already mounted.
2) If you specify an installation bundle with the -b flag, the installation bundle is
installed using the geninstall utility. The installation bundle syntax should
follow geninstall conventions. If you specify the -p preview flag,
geninstall will perform a preview operation.

V7.0.1
Instructor Guide
Uempty 3) If you specify a fix list, with the -f flag, the fix list is installed using the
instfix utility. The fix list syntax should follow instfix conventions. If you
specify the -p preview flag, then instfix will perform a preview operation.
4) If you specify the update_all function, with the -a flag, it is performed using
the install_all_updates utility. If you specify the -p preview flag, then
install_all_updates performs a preview operation. Note: It is possible to
perform one, two, or all three of the installation options during a single
customization operation.
5) The standby boot image is created and written to the standby BLV using the
AIX bosboot command. You can block this step with the -N flag. You should
only use the -N flag if you are an experienced administrator and have a good
understanding of the AIX boot process.
6) Upon exit, if standby BOS file systems were mounted in step 1, they are
unmounted.
Mounting and unmounting standby BOS

It is possible to access and modify the standby BOS by mounting its file systems over
the standby BOS file system mount points. The multibos mount operation, using the -m
flag, mounts all standby BOS file systems in the appropriate order.
The multibos unmount operation, using the -u flag, unmounts all standby BOS file
systems in the appropriate order
Instructor Guide
Instructor notes:
Purpose — Explain the various standby BOS operations.
Details — Provide a brief description of what each of these options provide and why they
might want to do them. Do not spend too much time here; they will experience these
first-hand in the lab exercises.
Transition statement — Let’s continue with using the standby shell, booting to a particular
boot logical volume, and finally removing a standby BOS.

V7.0.1
Instructor Guide
Uempty
Other multibos operations (2 of 2)

IBM Power Systems
Standby BOS shell

multibos –S
Exit returns to active shell environment
Booting to either standby BOS or active BOS

bootlist –m normal hdisk# blv#
shutdown -Fr
Removing a standby BOS

multibos -R
Figure 9-17. Other multibos operations (2 of 2) AN152.2
Notes:
Standby BOS shell

The multibos shell operation -S flag enables you to start a limited interactive chroot
shell with standby BOS file systems. This shell allows access to standby files using
standard paths. For example, /bos_inst/usr/bin/ls maps to /usr/bin/ls within the shell.
The active BOS files are not visible outside of the shell, unless they have been mounted
over the standby file systems. Limit shell operations to changing data files, and do not
make persistent changes to the kernel, process table, or other operating system
structures. Only use the BOS shell if you are experienced with the chroot environment.
The multibos shell operation performs the following steps:
1. The standby BOS file systems are mounted, if they are not already.
2. The chroot utility is called to start an interactive standby BOS shell. The shell runs
until an exit occurs.
3. If standby BOS file systems were mounted in step 1, they are unmounted.
Instructor Guide
Alternate boot
The bootlist command supports multiple BLVs. As an example, to boot from disk
hdisk0 and BLV bos_hd5, you would enter the following:
# bootlist –m normal hdisk0 blv=bos_hd5
After the system is rebooted from the standby BOS, the standby BOS logical volumes
are mounted over the usual BOS mount points, such as /, /usr, /var, and so on. The set
of BOS objects, such as the BLV, logical volumes, file systems, and so on that are
currently booted are considered the active BOS, regardless of logical volume names.
The previously active BOS becomes the standby BOS in the existing boot environment.
Some facilities have been blocked from alternating the BLV. When they tried to set the
bootlist to the standby BLV, they would receive the following error:
0514-226 bootlist: Invalid attribute value for blv
This is an indication that either the BLV is corrupted or the ODM entry for it is corrupted.
A suggested solution is to rebuild the standby BLV. This requires a special bosboot flag:
# bosboot -sd /dev/ipldevice -M standby -l bos_hd5
Removing the standby BOS

The remove operation, using the -R flag, deletes all standby BOS objects, such as BLV,
logical volumes, file systems, and so on.
You can use the remove operation to make room for a new standby BOS, or to clean up
a failed multibos installation. The remove operation performs standby tag verification
on each object before removing it. The remove operation will only act on BOS objects
that multibos created, regardless of name or label. You always have the option of
removing additional BOS objects using standard AIX utilities, such as rmlv, rmfs, rmps,
and so on.
The multibos remove operation performs the following steps:
1) All boot references to the standby BLV are removed.
2) The bootlist is set to the active BLV. You can skip this step using the -t flag.
3) Any mounted standby BLVs are unmounted.
4) Standby file systems are removed.
5) Remaining standby logical volumes are removed.

V7.0.1
Instructor Guide

Purpose — Explain the various standby BOS operations.
Details — Provide a brief description of what each of these options provide and why they
might want to do them. Do not spend too much time here; they will experience these
first-hand in the lab exercises.
Transition statement — For an ILO (Instructor Lead On-line) class: In place of this
checkpoint visual you should play file AN152U09F19.
Elluminate.
play.
Instructor Guide
Checkpoint (1 of 2)
IBM Power Systems
1. Name the two ways alternate disk installation can be used.
2. What are the advantages of alternate disk rootvg cloning?
3. Why should you not use exportvg with an alternate disk

volume group?
Notes:

V7.0.1
Instructor Guide

Purpose —
Details —
IBM Power Systems

The answer is installing a mksysb image on another disk and
cloning the current running rootvg to an alternate disk.

The answer is creates an online backup and allows maintenance
and updates to software on the alternate disk helping to minimize
down time.
3. Why should you not use exportvg with an alternate disk volume
group?
The answer is this will remove rootvg related entries from
/etc/filesystems.
Instructor Guide
Checkpoint (2 of 2)
IBM Power Systems
4. True or false: multibos provides for booting between

alternate operating system environments within a single
rootvg.
5. True or false: A standby BOS can only be accessed by

changing the bootlist and then rebooting.
6. True or false: multibos requires cloning all of the logical

volumes in the active rootvg.
Notes:

V7.0.1
Instructor Guide

Purpose —
Details —
IBM Power Systems
4. True or false: multibos provides for booting between alternate

operating system environments within a single rootvg.
The answer is true.
5. True or false: A standby BOS can only be accessed by changing

the bootlist and then rebooting.
The answer is false.
6. True or false: multibos requires cloning all of the logical volumes

in the active rootvg.
Transition statement — Let’s do a lab exercise using the multibos facility.
Instructor Guide
Exercise: multibos
IBM Power Systems
Create and work with an alternate

rootvg
Create and work with standby BOS

using multibos
Figure 9-20. Exercise: multibos . AN152.2
Notes:

V7.0.1
Instructor Guide

Purpose —
Details —
Instructor Guide
Unit summary
IBM Power Systems

Use alternate disk installation techniques for applying AIX
maintenance
Use multibos to apply AIX maintenance
Notes:
Discussion:
Alternate disk installation techniques are available:
- Installing a mksysb onto an alternate disk
- Cloning the current rootvg onto an alternate disk
Alternate BOS can be created and maintenance applied

V7.0.1
Instructor Guide

Instructor Guide

V7.0.1
Instructor Guide
Uempty Unit 10. Advanced backup techniques
Estimated time
00:20 Topic 1
00:20 Topic 2
00:15 Topic 3

This unit describes techniques to ensure data integrity and
consistency while executing online backups.

• Explain factors related to online backup consistency
• Use JFS split mirror to backup file system data
• Use a snapshot volume groups to backup file system data
• Use JFS2 snapshot to back up file system data
• Explain AIX considerations in using SAN Copy facilities

Accountability:
• Lab exercise
Reference
management
Online AIX Version 7.1 Installation and migration
© Copyright IBM Corp. 2009, 2012 Unit 10. Advanced backup techniques 10-1
Instructor Guide

SG24-7463 AIX 5L Differences Guide: V 5.3 Edition (Redbook)
SG24-7414 AIX 5L Differences Guide: V 5.3 Addendum
SG24-5765 AIX 5L Differences Guide: V 5.2 Edition (Redbook)
SG24-2014 AIX Version 4.3 Differences Guide (Redbook)

V7.0.1
Instructor Guide
Uempty
Unit objectives
IBM Power Systems

Explain factors related to online backup consistency
Use JFS split mirror to back up file system data
Use a snapshot volume group to backup file system data
Use JFS2 snapshot to back up file system data
Explain AIX considerations in using SAN Copy facilities
Notes:
Instructor Guide
Instructor notes:
Purpose — List this unit’s objectives.
Details —
Transition statement — Let’s start with a discussion of data consistency issues.

V7.0.1
Instructor Guide
Uempty
Backup data inconsistency

IBM Power Systems
Applications may have multiple data updates per transaction

Failure to capture all related updates results in inconsistent
backups
Application may use transaction logs to re-establish integrity
during recovery
Otherwise, backup needs to have consistency
Data states
Transaction X0, Y0
Write X1 X1, Y0
backup
X1, Y0
Write Y1 X1, Y1
Figure 10-2. Backup data inconsistency AN152.2
Notes:
Backing up data while a file system is active can lead to data consistency problems.
The backup utility is sequentially copying files while applications may still be updating
those contents. For a collection of related updates, the backup utility may copy one
piece of data the data after the update, but copy the other related data before it is
updated. The result can be a backup with two pieces of data which are not consistent
with one another.
Some applications, especially database engines, record the progress of related updates
in a transaction log. During the application recovery process, that log will identify
transactions where not all related updates were confirmed. The recovery process will
then back out that transaction, backing out any updates that were recorded during the
previous backup.
If an application does not have this type of recovery logic, then use of the inconsistent
backup can result in serious problems. In that situation, we need to have a way to
ensure that the backup has consistency.
Instructor Guide
Instructor notes:
Purpose — Explain the issue of consistency in data backups.
Details —
Transition statement — Let’s look at some solutions for ensuring consistent data
backups.

V7.0.1
Instructor Guide
Uempty
Ensuring backup data consistency

IBM Power Systems
Offline backup: Integrity assured by stopping application and

unmounting the file systems
Online backup: Integrity assured by quiescing application

processing during backup
Stops writing data to file system for new transactions
Completes writes for previously started transactions
Problem: Time needed to backup is often too long to have

application stopped or quiesced
Solution: Provide a quick way to capture a stable data state,

thus requiring only a brief quiesce
Figure 10-3. Ensuring backup data consistency AN152.2
Notes:
Traditionally, the best practice is to stop the application and unmount the file system,
followed by executing a backup by inode. This ensures that there are no updates
occurring during the backup and that all file system’s data has been flushed to disk. If a
backup takes a long time, having the application down for that long may be
unacceptable.
Some applications can be quiesced. In this state, either new transactions are not
accepted or they are only processed in user space without writing the updates to the file
system. Either way, the backup of the mounted file system may proceed without any file
system activity from the quiesced application. Again, if the backup takes a long time,
being quiesced for that long may still be unacceptable.
The solution is to use the quiesced state to quickly capture the state of the file system.
The captured state would not be affected by on-going updates to the actual file system.
A method for capturing the file system state may only run for a few seconds. Such a
short time for being in a quiesced state is often acceptable.
Instructor Guide
Instructor notes:
Purpose — Explain traditional solutions to providing consistent backups.
Details — Sometimes the quiesced state is referred to as the hot-backup mode, online
backup mode, or suspend mode of the application. The students should be encouraged to
work with the trained administrator of the application to obtain the proper state for online
backup.
Transition statement — We need to examine various ways to capture a point in time state
of a file system. The first methods we will look at are methods which require us to mirror the
file systems using LVM mirroring.

V7.0.1
Instructor Guide
Uempty 10.1. LVM mirror-based online backups
Instructor Topic Introduction

What students will do — The students will identify how splitting the logical volume mirror
can be useful when attempting a system backup.
How students will do it — Through lecture and checkpoint questions.
What students will learn — Students will learn how to handle splitting the mirror and
accessing the data for backup.
How this will help students in their job — Understanding how to split a logical volume
mirror allows an administrator to keep a file system online but stop activity to one of the
mirrors to provide a stable environment to backup the data. This ensures that no data is
changing on the copy to be used for backup.
Instructor Guide
Topic 1 objectives
IBM Power Systems

Create a split mirror of a JFS file system
Backup data from the copy that was split off
Reintegrate the split copy with the remaining mirror copies
Notes:

V7.0.1
Instructor Guide

Purpose — Explain the objectives of this topic.
Details —
Transition statement — Let us first look at JFS online backups.
Instructor Guide
Online JFS backup

IBM Power Systems
File system: /fs1
Copy 1 Copy 2 Copy 3
jfslog
# lsvg -l newvg
newvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
loglv00 jfslog 1 3 3 open/syncd N/A
lv03 jfs 1 3 3 open/syncd /fs1
Figure 10-5. Online JFS backup AN152.2
Notes:
Requirements
By splitting a mirror, you can perform a backup of the mirror that is not changing while
the other mirrors remain online.
To do this, it is best to have three copies of your data. You will need to stop one of the
copies but the other two will continue to provide redundancy for the online portion of the
logical volume.
You are also required to mirror the journal log for the file system.
The output from lsvg -l indicates that the logical volume and the log are both mirrored.
# lsvg -l newvg
newvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
lv03 jfs 1 3 3 open/syncd /fs1

V7.0.1
Instructor Guide

Purpose — To explain what this technique is used for.
Details — Using this technique creates a snapshot of the file system that can be used for
backup. Three copies is the recommend number of mirrors to have so that there is
redundancy on the portion of the file system that remains active.
Additional information — If you want to back up multiple file systems at the same time,
you will need to create separate logs for each file systems and mirror each log.
This technique was introduced in AIX V4.3.3.
Transition statement — Once your LVM mirroring is set up, let’s see how to split off a
copy.
Instructor Guide
Splitting the mirror

IBM Power Systems
/backup
File system
/fs1
Copy 1 Copy 2 Copy 3
jfslog
# chfs -a splitcopy=/backup -a copy=3 /fs1
Figure 10-6. Splitting the mirror AN152.2
Notes:
Using chfs to split a mirror

The chfs command is used to split the mirror to form a snapshot of a JFS file system.
# chfs -a splitcopy=/backup -a copy=3 /fs1
This creates a read-only file system called /backup that can be accessed to perform a
backup. The journal log logical volume associated with the file system we are splitting
must also be mirrored.

V7.0.1
Instructor Guide
Uempty Example
# lsvg -l newvg
newvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT
POINT
lv03 jfs 1 3 3 open/stale /fs1
lv03copy00 jfs 0 0 0 open/syncd /backup
The /fs1 file system still contains three physical partitions, but the mirror is now stale.
The stale copy is now accessible by the newly created read-only file system /backup.
That file system resides on a newly created logical volume, lv03copy00. This logical
volume is not synchronized and is considered stale. Also, it does not indicate any
logical partitions (since the logical partitions really belong to lv03).
You can look at the content and interact with the /backup file system just like any other
read-only file system.
Instructor Guide
Instructor notes:
Purpose — To show how to split the mirror.
Details — It is important to emphasis that this split mirror technique can only be used with
JFS file systems and can not be used with JFS2 file systems.
Emphasize that we still need to stop file system updates during the period when the
splitcopy operation is occurring. Otherwise, the splitcopy could have inconsistent data.
But, the splitcopy happens so quickly that the application only needs to be quiesced for a
very short period of time.
The chfs command is used to split the mirror. It will create a new file system that will
contain the contents of the snapshot. You can view or back up the content of the file
system.
Additional information — This unit assumes that the application uses file systems to hold
the data. While it is not very common, some applications may use raw logical volumes, in
which case the file system level command is not used. Instead there is an LVM command,
chlvcopy, that can be used to create a split mirror copy.
Transition statement — When you have completed these tasks and are ready to
reintegrate the mirror, you just need to use the rmfs command. Let’s see how that works.

V7.0.1
Instructor Guide
Uempty
Reintegrate a mirror backup copy

IBM Power Systems
File system /backup

/fs1
syncvg
Copy 1 Copy 2
Copy 3
syncvg
jfslog
# unmount /backup
# rmfs /backup
Figure 10-7. Reintegrate a mirror backup copy AN152.2
Notes:
Reintegrate the backup copy

To reintegrate the snapshot into the file system, unmount the /backup file system and
then remove it with the rmfs command.
The third copy will automatically re-sync and come online. The file system for the split
copy is removed.
The downside to this method is that all copies in the split mirror are considered stale
and they all must be resynced when the it is rejoined. For vary large file systems, this
can take some time during which the application must compete for access to the data
with the syncvg operation.
Instructor Guide
Instructor notes:
Purpose — To explain how to reintegrate the stale copy back into the active logical
volume.
Details —
Transition statement — The second method, based on LVM mirroring, is based on the
creation of a snapshot volume group. Let’s look at how to create a snapshot volume group.

V7.0.1
Instructor Guide
Uempty
Snapshot volume groups (1 of 2)

IBM Power Systems
All logical volumes must be mirrored on disks that contain

only those mirrors
Ensure there are no stale copies
Use the splitvg command to split a mirrored copy into a

snapshot volume group
Uses the recreatevg command to implement
The split copy becomes a new volume group, called a

snapshot volume group, with its own VGname
New logical volumes and mount points are created in the

snapshot volume group
The snapshot file systems are not automatically mounted

Figure 10-8. Snapshot volume groups (1 of 2) AN152.2
Notes:
How it works
Snapshot support for a mirrored volume group is provided to split a mirrored copy of a
fully mirrored volume group into a snapshot volume group.
It is best practice to ensure that there are no stale copies in the original volume group.
The splitvg command will reject a situation where the only remaining non-stale copy is
in disk to be split unless you use the force (-f) option.
When the volume group is split, the original volume group will stop using the disks that
are now part of the snapshot volume group.
The splitvg command uses the recreatevg command to implement the split. This is a
very different technique from the JFS split mirror. It creates a new volume group with
new file system and logical volume names.
Instructor Guide
Instructor notes:
Purpose — To describe the volume group snapshot support.
Details — Discuss the student notes.
Since the splitvg is not based on JFS split mirror, it does not have the same file system
type restriction. It can be used with either JFS or JFS2 file systems.
Also mention some or all of the following restrictions:
• The only allowable chvg options on the snapshot volume group are -a, -R, -S and -u
• The only allowable chvg options on the original volume group are -a, -R, -S, -u and -h
• Partition allocation changes will not be allowed on the snapshot volume group
• A volume group cannot be split if a disk is already missing
• A volume group cannot be split if the last non-stale partition would be on the snapshot
volume group
Transition statement — Once split, the two volume groups are still related to one another.
Let us look at the implications this has for later resynchronization of the copies.

V7.0.1
Instructor Guide
Uempty
Snapshot volume groups (2 of 2)

IBM Power Systems
Physical partition changes in both volume groups are tracked

Writes to a physical partition in the original volume group causes a
corresponding physical partition in the snapshot volume group to be
marked stale
Writes to a physical partition in the snapshot volume group causes that
physical partition to be marked stale
Use the joinvg command to rejoin the volume groups

The stale physical partitions are resynchronized
The user will see the same data in the rejoined volume group as was
in the original volume group before the rejoin
Figure 10-9. Snapshot volume groups (2 of 2) AN152.2
Notes:
Both volume groups will keep track of changes in physical partitions within the volume
group so that when the snapshot volume group is rejoined with the original volume
group, the synchronization only needs to occur on the subset of physical partitions
which were touched during the split period. This is much faster and has less
performance impact than resynchronizing all physical partitions, as is needed with the
JFS split copy function.
Physical partition changes in both volume groups are tracked. Writes to a physical
partition in the original volume group causes a corresponding physical partition in the
snapshot volume group to be marked stale. Writes to a physical partition in the
snapshot volume group causes that physical partition to be marked stale.
To rejoin the volume groups, use the joinvg command. The stale physical partitions are
included in the original mirroring and the stale copies are automatically resynchronized.
The user will see the same data in the rejoined volume group as was in the original
volume group before the rejoin. In other words, the third copy will show the data
changes that occurred in the original volume group during the period it was split off.
Instructor Guide
Instructor notes:
Purpose — Continue the discussion on how snapshot volume groups work.
Details —
Transition statement — Now, let’s take a look at the commands.

V7.0.1
Instructor Guide
Uempty
Snapshot volume group commands

IBM Power Systems
splitvg [ -y SnapVGname ] [-c copy] [-f] [-i] Vgname

-y Specifies the name of the snapped volume group
-c Specifies which mirror to use (1, 2 or 3)
-f Forces the split even if there are stale partitions
-i Creates an independent volume group which cannot be rejoined
into the original
joinvg [-f] Vgname

-f Forces the join when disks in the snapshot volume group are
missing or removed
Figure 10-10. Snapshot volume group commands AN152.2
Notes:
The splitvg command

The splitvg command splits a single mirror copy of a fully mirrored volume group into
a snapshot volume group. The original volume group will stop using the disks that are
now part of the snapshot volume group. Both volume groups will keep track of the
writes within the volume group so that when the snapshot volume group is rejoined with
the original volume group consistent data is maintained across the rejoined mirror
copies.
The joinvg command

The joinvg command joins a snapshot volume group that was created with the
splitvg command back into its original volume group. The snapshot volume group is
deleted and the disks reactivated in the original volume group. Any stale partitions will
be re-synchronized by a background process.
Instructor Guide
Instructor notes:
Purpose — To describe the volume group snapshot command and options.
Details — Discuss the command syntax.
Specify the -f flag to force the join when disks in the snapshot volume group are not
active. The mirror copy on the inactive disks will be removed from the original volume
group.
Transition statement — Let’s look at an example of working with a snapshot volume
group.

V7.0.1
Instructor Guide
Uempty
Snapshot volume group example

IBM Power Systems
Example: File system /data is in the datavg volume group

These commands split the volume group, create a backup
of the /data file system and then rejoins the snapshot
volume group with the original.
1.splitvg -y snapvg datavg

The volume group datavg is split and the volume group snapvg is
created. The mount point /fs/data is created
2.backup -f /dev/rmt0 /fs/data
An i-node based backup of the unmounted file system /fs/data is
created on tape
3.joinvg datavg
snapvg is rejoined with the original volume group and synced in the
background
Figure 10-11. Snapshot volume group example AN152.2
Notes:
The splitvg creates a point in time separate snapshot volume group. The splitvg
command will fail if any of the disks to be split are not active within the original volume
group.
This volume group can be used to perform backup or other operations. In the example,
one of the renamed file systems is backed up by inode (unmounted). You could also
mount the file system and backup by name instead.
Later, the joinvg command is used to rejoin the snapshot volume to the original volume
group.
In the event of a system crash or loss of quorum while running this command, the
joinvg command must be run to rejoin the disks back to the original volume group.
You must have root authority to run these commands.
Instructor Guide
Instructor notes:
Purpose — Provide an example of using a split volume group for executing a backup.
Details — You may wish to review with the students the strengths and weakness of the
LVM mirror based techniques. One of the big weaknesses is the need to use double or
triple the amount of storage that would be used without mirroring. Ask if any of the students
has actually used these in their shop (be careful - they may confuse the next topic with
what we have just covered).
This may be a good time to do Part 1 (snapshot volume groups) of the Lab Exercises for
this unit, instead of doing all of the exercises at the end of this unit. While JFS split mirror
was covered in this topic, Part 2 (JFS split mirror) of the exercise is an optional part.
Transition statement — Let’s next look at a backup technique which does not depend on
using LVM mirroring.

V7.0.1
Instructor Guide
Uempty 10.2.JFS2 snapshot

What students will do — The students will identify how creating a JFS2 snapshot can be
useful as a file system backup tool.
How students will do it — Through lecture and checkpoint questions
What students will learn — Students will learn how to create and manage JFS2
snapshots.
How this will help students in their job — Understanding how to use a JFS2 snapshot
can allow for consistent backup with only a very brief quiesce of application activity.
Instructor Guide
Topic 2 objectives
IBM Power Systems

Create either an internal or external JFS2 snapshot
List existing JFS2 snapshots
Recover lost or corrupted files from a JFS2 snapshot
Remove a JFS2 snapshot
Increase the size of an external JFS2 snapshot
Notes:

V7.0.1
Instructor Guide

Purpose — Cover the topic objectives.
Details —
Transition statement — Let’s define a JFS2 snapshot and what it provides us.
Instructor Guide
JFS2 snapshot (1 of 2)
IBM Power Systems
A point-in-time image of a JFS2 file system

Source file system is called the snapped file system (snappedFS)
Snapshot creation is very quick and requires little space
It can have multiple snapshots for a single snappedFS, each taken at a
different point in time
A snapshot image of a JFS2 file system can be used to:

Restore files from a known point in time
Access files or directories as they were at the time of the snapshot
Back up a mounted snapshot to tape, DVD or a remote server
Figure 10-13. JFS2 snapshot (1 of 2) AN152.2
Notes:
JFS2 snapshot
A point-in-time image for a JFS2 file system is called a snapshot. The file system which
is the source of this point-in-time image is referred to as the snapped file system or
snappedFS.
The snapshot view of the data remains static and retains the same security permissions
that the original snappedFS had when the snapshot was made. Also, a JFS2 snapshot
can be created without unmounting the file system, or quiescing the file system (though
it may be advisable for some application to briefly quiesce during the snapshot). A
snapshot can be used to access files or directories as they existed when the snapshot
was taken.
The snapshot can then be used to create a backup of the file system at the given point
in time that the snapshot was taken. The snapshot also provides the capability to
access files or directories as they were at the time of the snapshot.

V7.0.1
Instructor Guide

Purpose — Describe the JFS2 snapshot function.
Details — Be very clear that while the term snapshot is used in both snapshot volume
group and JFS2 snapshot, they techniques are totally different. When some starts talking
about using snapshot, the listener need to ask questions to clarify what type of snapshot is
being discussed. It might even be a storage subsystem based technique being discussed.
Transition statement — Let’s see how to create a JFS2 snapshot.
Instructor Guide
JFS2 snapshot (2 of 2)
IBM Power Systems
Snapshot stays stable while snappedFS is changing
Using snapshot reduces application downtime

Automatically freezes I/O while snapshot is created
If intolerant of fuzzy backups, briefly quiesce the application
A snapshot typically needs 2% - 6% of snappedFS space requirements.

There are two options:
Separate logical volume (ppsize unit of allocation)
Allocate space out of snappedFS (called an internal snapshot)
At snapshot creation, only structure information is included
When a write or delete occurs in the snappedFS, the affected blocks are
copied into existing snapshots
Figure 10-14. JFS2 snapshot (2 of 2) AN152.2
Notes:
How the JFS2 snapshot works

During creation of a snapshot, the snappedFS I/O will be momentarily frozen, and all
new writes are blocked. This ensures that the snapshot really is a consistent view of the
file system at the time of snapshot.
When a snapshot is initially created, only structure information is included. When a write
or delete occurs, then the affected blocks are copied into the snapshot file system.
Every read of the snapshot will require a lookup to determine whether the block needed
should be read from the snapshot or from the snappedFS. For instance, the block will
be read from the snapshot file system if the block has been changed since the snapshot
took place. If the block is unchanged since the snapshot, it will be read from the
snappedFS.
There are two types of JFS2 snapshots: internal and external. A JFS2 internal snapshot
uses space within the snappedFS. A JFS2 external snapshot is created in a separate

V7.0.1
Instructor Guide
Uempty logical volume from the file system. The external snapshot can be mounted separately
from the file system at its own unique mount point. A given file system can only use
either internal or external snapshots; it cannot mix the different types.
Space requirements for a snapshot

Typically, a snapshot will need 2-6% of the space needed for the snappedFS. In the
case of a highly active snappedFS, this estimate could rise to 15%. This space is
needed if a block in the snappedFS is either written to or deleted. If this happens, the
block is copied to the snapshot. Any blocks associated with new files written after the
snapshot was taken will not be copied to the snapshot, as they were not current at the
time of the snapshot and therefore not relevant.
If the snapshot runs out of space, all snapshots associated with the snappedFS will be
discarded and an entry will be made in the AIX error log. If a snapshot file system fills up
before a backup is taken, the backup is not complete and will have to be rerun from a
new snapshot, with possibly a larger size, to allow for changes in the snappedFS.
Instructor Guide
Instructor notes:
Purpose — Continue basic discussion of a JSF2 snapshot.
Details —
Additional information — A JFS2 snapshot is a file system that maps its contents to the
contents of the source snappedfs. If the snappedfs is not modified, the snapshot does not
store any of the files in its own physical partition allocations, and has content which is
identical to the snappedfs.
If the snappedfs is modified, the original value of the affected blocks are saved in the
allocated storage of the snapshot file system. When the snapshot is modified, it either
retrieves the data from the snappedfs (if the data has not been modified) or it retrieves the
data from its own disk storage (if the snappedfs data was changed).
So, the snapshot always gives us the state of the data at the time the snapshot was
created, but only uses enough storage to hold the data that has been changed in the
snappedfs. When allocating space for a snapshot logical volume, we can typically allocate
as little as 2-6% of the size of the snappedfs (depending on the volatility of the snappedfs).
Note that when compared to using split mirror copies, the JFS2 snapshot has very little
overhead. We do not have to create a total copy of the existing data when creating the
snapshot (as we do in creating mirror copies) and instead of doing a resync of the data
before the next backup (as we need to do with the spit mirror when rejoining), we simply
eliminate the snapshot and create a new one when needed for the next backup.
Transition statement — Let’s take a closer looks at the mechanism behind a JFS2
snapshot.
multimedia library of Elluminate in place of the next two visuals. You can then continue your
Elluminate.
play.
in place of the next two visuals.
next two visuals as well.

V7.0.1
Instructor Guide
Uempty
JFS2 snapshot mechanism (1 of 2)

IBM Power Systems
snappedFS
inode1 inode2
snapshot
inode1 inode2
Initially, the snapshot only points to data extents in

snappedFS
Figure 10-15. JFS2 snapshot mechanism (1 of 2) AN152.2
Notes:
Data blocks in snappedFS

The diagram, at the top, shows two inodes anchoring file data blocks. The inode
accesses the data blocks through a binary tree structure.
Data blocks in JFS2 snapshot

The diagram, at the bottom, shows the structure initially created in a JFS2 snapshot.
The snapshot has the metadata, but all of the pointers refer back to the snappedFS
data blocks. Thus, the snapshot requires very little space. Initially, data retrieved from a
mounted snapshot is identical to the current data in the snappedFS.
Instructor Guide
Instructor notes:
Purpose — Explain how a snapshot accesses the snappedFS data blocks.
Details —
Transition statement — Let’s look at what happens as data blocks in the snappedFS are
modified.

V7.0.1
Instructor Guide
Uempty
JFS2 snapshot mechanism (2 of 2)

IBM Power Systems
snappedFS
inode1 inode2
snapshot
inode1 inode2
Original of modified data copied to snapshot
Figure 10-16. JFS2 snapshot mechanism (2 of 2) AN152.2
Notes:
Data blocks in snappedFS after data changes

In the diagram, at the top, some of the data blocks have been modified. Because the
kernel file system logic knows that there is a snapshot for this file systems, it copies the
original data blocks to the snapshot before modifying (or deleting) those data blocks in
the snappedFS.
Data blocks in JFS2 snapshot after data changes

The diagram, at the bottom, shows that the inode tree structure points to the copies of
the original data (now stored in the snapshot) rather than referring back to the
snappedFS data blocks. This ensures that access to the snapshot always returns the
original data (from the time the snapshot was created) for the snappedFS.
Instructor Guide
Instructor notes:
Purpose — Explain how the original version of the data is copied to the snapshot when
modified.
Details —
Transition statement — Let’s look at how we can implement the JFS2 snapshot, starting
with the SMIT facility.

V7.0.1
Instructor Guide
Uempty
JFS2 snapshot SMIT menu

IBM Power Systems
# smit jfs2
Enhanced Journaled File Systems
Move cursor to desired item and press Enter.
. . .
List Snapshots for an Enhanced Journaled File System
Create Snapshot for an Enhanced Journaled File System
Mount Snapshot for an Enhanced Journaled File System
Remove Snapshot for an Enhanced Journaled File System
Unmount Snapshot for an Enhanced Journaled File System
Change Snapshot for an Enhanced Journaled File System
Rollback an Enhanced Journaled File System to a Snapshot
Figure 10-17. JFS2 snapshot SMIT menu AN152.2
Notes:
The various JFS2 snapshot operations can be executed from SMIT dialog panels. The
SMIT JFS2 menu includes many items which are JFS2 snapshot related.
An example with only the menu items for snapshot is shown in the visual.
Instructor Guide
Instructor notes:
Purpose — Show how all the JFS2 snapshot functions can be accessed from the SMIT
JFS2 menu.
Details —
Transition statement — Let’s first look at how we create an external snapshot.
Elluminate.
play.

V7.0.1
Instructor Guide
Uempty
Creating snapshots: External

IBM Power Systems
# snapshot -o snapfrom=snappedFS -o size=Size

# snapshot -o snapfrom=/home/myfs -o size=16M
-OR-
# smit crsnapj2
Create
Create Snapshot
Snapshot for
for an
an Enhanced
Enhanced Journaled
Journaled File
File System
System
in
in New
New Logical Volume
Logical Volume
[Entry
[Entry Fields]
Fields]
File
File System
System Name
Name /home/myfs
/home/myfs
SIZE
SIZE of
of snapshot
snapshot
Unit
Unit Size
Size Megabytes
Megabytes ++
** Number
Number of
of units
units [500]
[500] ##
Creating a snapshot as part of the mount option

# mount –o snapto=/dev/mysnaplv /home/myfs
Figure 10-18. Creating snapshots: External AN152.2
Notes:
Creating an external snapshot for a JFS2 file system that is already

mounted
When creating a new external snapshot, you must provide the size of the logical volume
allocation (unless using a pre-existing LV).
If you want to create a snapshot for a mounted JFS2 file system, you can use the
following method:
• To create a snapshot in a new logical volume, specifying the size:
# snapshot -o snapfrom=snappedFS -o size=Size
For example:
Instructor Guide
# snapshot -o snapfrom=/home/myfs -o size=16M

This will create a 16 MB logical volume and create a snapshot for the
/home/myfs file system on the newly created logical volume.
Creating an internal snapshot for a JFS2 file system that is already

mounted
If you want to create an internal snapshot for a mounted JFS2 file system, you can use
the following method:
• To create a internal snapshot, specify a snapshot name:
# snapshot -o snapfrom=snappedFS -n snapshotname
For example:
# snapshot -o snapfrom=/home/myfs -n mysnap
This will create a snapshot named mysnap which is internal to the snappedFS
/home/myfs.
Creating an internal snapshot for a JFS2 file system that is not mounted
First, it is important to know that the you cannot use internal snapshots unless the file
system was enabled to support them at file system creation.
• To enable the file system to support internal snapshots (at creation time only):
# crfs –a isnapshot=yes ....
The mount option, -o snapto=snapshotlv, can be used to create a snapshot for a
JFS2 file system that is not currently mounted:
# mount -o snapto=snapshotLV snappedFS MountPoint
or
# mount -o snapto=snapshotname snappedFS MountPoint
If the snapto value starts with a slash, then it is assumed to be a special device file for
an existing logical volume where the snapshot should be created. If the snapto value
does not start with a slash, then it is assumed to be the name of an internal snapshot to
be created.
For example:
# mount -o snapto=/dev/mysnaplv /dev/fslv00 /home/myfs
This will mount the file system contained on the /dev/fslv00 to the mount point of
/home/myfs and then proceeds to create a snapshot for the /home/myfs file system in
the logical volume /dev/mysnaplv.

V7.0.1
Instructor Guide
Uempty Creating a snapshot using an existing logical volume

If you want to control details of the logical volume which holds an external snapshot,
you can use the following method:
• To create a snapshot using an existing logical volume:
# snapshot -o snapfrom=snappedFS snapshotLV
For example:
# snapshot -o snapfrom=/home/myfs /dev/mysnaplv
This will create a snapshot for the /home/myfs file system on the /dev/mysnaplv
logical volume, which already exists.
Instructor Guide
Instructor notes:
Purpose — Describe how to create a JFS2 snapshot
Details —
Transition statement — Let’s take a look at how you create a JFS2 internal snapshot.

V7.0.1
Instructor Guide
Uempty
Creating snapshots: Internal

IBM Power Systems
# snapshot -o snapfrom=snappedFS –n snapshotName

# snapshot -o snapfrom=/home/myfs –n mysnap
-OR-
# smit crintsnapj2
Create
Create Snapshot
Snapshot for
for an
an Enhanced
Enhanced Journaled
Journaled File
File System
System in
in File
File System
System
[Entry
[Entry Fields]
Fields]
File
File System
System Name
Name /home/myfs
/home/myfs
** Snapshot Name
Snapshot Name [mysnap]
[mysnap]
Internal snapshot attribute must be set to yes on creation of the file system:
# smitcrfs (in dialog panel: Allow Internal Snapshots [yes])

-OR-
# crfs –a isnapshot=yes
Figure 10-19. Creating snapshots: Internal AN152.2
Notes:
Internal JFS2 snapshot considerations:

Internal snapshots are preserved when the logredo command runs on a JFS2 file
system with an internal snapshot.
Internal snapshots are removed if the fsck command has to modify a JFS2 file system
to repair it.
If an internal snapshot runs out of space, or if a write to an internal snapshot fails, all
internal snapshots for that snappedFS are marked invalid. Further access to the internal
snapshots will fail. These failures write an entry to the error log.
Internal snapshots are not separately mountable.
Internal snapshots are not compatible with AIX releases prior to AIX 6.1. A JFS2 file
system created to support internal snapshots cannot be modified on an earlier release
of AIX.
A JFS2 file system with internal snapshots cannot be defragmented.
Instructor Guide
Instructor notes:
Purpose — Explain how to create a JFS2 internal snapshot.
Details —
Transition statement — Later, we will want to identify if a file system has a snapshot and
obtain information about those snapshots.

V7.0.1
Instructor Guide
Uempty
Listing snapshots
IBM Power Systems
# smit lssnap (and select file system from list)

-OR-
## snapshot
snapshot -q
-q /home/myfs2
/home/myfs2
Snapshots
Snapshots for
for /home/myfs2
/home/myfs2
Current
Current Name
Name Time
Time
mysnap
mysnap Wed
Wed 19
19 Nov
Nov 08:44:33
08:44:33 2008
2008
mysnap2
mysnap2 Fri 21 Nov 09:33:33 2008
Fri 21 Nov 09:33:33 2008
** mysnap3
mysnap3 Mon
Mon 24
24 Nov
Nov 14:03:18
14:03:18 2008
2008
## snapshot
snapshot -q
-q /home/myfs
/home/myfs
Snapshots
Snapshots for
for /home/myfs
/home/myfs
Current
Current Location
Location 512-blocks
512-blocks Free
Free Time
Time
** /dev/fslv06
/dev/fslv06 262144
262144 261376
261376 Wed
Wed May
May 66 18:15:11
18:15:11 2009
2009
Figure 10-20. Listing snapshots AN152.2
Notes:
The snapshot –q option can be used display the snapshots related to the specified file
system.
If the file system uses internal snapshots, then the report provides the snapshot names
and creation times. The * indicates the current snapshot.
# snapshot -q /home/myfs2
Snapshots for /home/myfs2

Current Name Time
mysnap Wed 19 Nov 08:44:33 2008
mysnap2 Fri 21 Nov 09:33:33 2008
* mysnap3 Mon 24 Nov 14:03:18 2008
If the file system uses external snapshots, then the report provides, for each snapshot,
the logical volume special device file, the snapshot size, how much space is free in the
snapshot, and the creation time.
Instructor Guide
# snapshot -q /home/myfs
Snapshots for /home/myfs

Current Location 512-blocks Free Time
* /dev/fslv06 262144 261376 Wed May 6 18:15:11 2009

V7.0.1
Instructor Guide

Purpose — Explain how to display snapshot information.
Details —
Transition statement — Let’s look at how we can use an existing snapshot to recover files
which were inadvertently deleted or incorrectly modified.
Instructor Guide
Using a JFS2 snapshot to recover

IBM Power Systems
Recover entire file system to point of snapshot creation:

# umount /home/myfs
# rollback /home/myfs /dev/mysnaplv (for external)
# rollback –n mysnap /home/myfs (for internal)
Recover individual files from JFS2 snapshot image:

Mount the snapshot (if external):
# mount -v jfs2 -o snapshot /dev/mysnaplv /mntsnapshot
Change to the directory that contains the snapshot:

# cd /mntsnapshot
# cd /home/mfs/.snapshot/mysnap (if internal)
Copy the accurate file to overwrite the corrupted one:

# cp myfile /home/myfs (Copies only the file named myfile)
Figure 10-21. Using a JFS2 snapshot to recover AN152.2
Notes:
Rollback
The rollback command is an interface to revert a JFS2 file system to a point-in-time
snapshot. The snappedFS parameter must be unmounted before the rollback
command is run and remains inaccessible for the duration of the command. Any
snapshots that are taken after the specified snapshot (snapshotObject for external or
snapshotName for internal) are removed. The associated logical volumes are also
removed for external snapshots.
Recover individual files

If you wish to restore individual files back to their original state, then you can mount the
snapshot and then manually copy the files back over. If the snapshot is internal, then no
mount is necessary. Instead, you need to explicitly specify the path to the snapshot
(/snappedFS-mount-point/.snapshot/snapshot-name) on a change directory
command.

V7.0.1
Instructor Guide
Uempty As with any file copying, be careful about changing the nature of the file (ownership,
permission, sparseness, and so on). Using the backup and restore utilities to
implement a copy of files is often a safer technique.
Instructor Guide
Instructor notes:
Purpose — Explain how to use a JFS2 snapshot to recover data.
Details —
Transition statement — While using a snapshot directly to recover data is useful, it does
not address a situation in which the disk holding the snappedFS is lost, much less a site
disaster recovery situation. Let’s look at how we can use a snapshot as a stable source for
a backup to media or to a network server.

V7.0.1
Instructor Guide
Uempty
Using a JFS2 external snapshot to back up

IBM Power Systems
The JFS2 snapshot can be a stable source for backup to

media
Mount the external snapshot and use relative path backup:

# mount -v jfs2 -o snapshot /dev/mysnaplv /mntsnapshot
# cd /mntsnapshot
# find . | backup –i –d /servermnt/backup52
To create snapshot and backup in one operation:

# backsnap -m MountPoint -s Size BackupOptions snappedFS
For example:
# backsnap -m /mntsnapshot -s size=16M –I –f /dev/rmt0
/home/myfs
Figure 10-22. Using a JFS2 external snapshot to back up AN152.2
Notes:
Using an existing external snapshot to execute a backup

For an external snapshot, you first need to mount the snapshot. Then it is simply a
matter of specifying the mount point in your backup by name execution.
Creating an external snapshot and backup in one operation

The backsnap command provides an interface to create a snapshot for a JFS2 file
system and perform a back up of the snapshot. The command syntax for an external
snapshot is:
For example:
# backsnap -m /mntsnapshot -s size=16M -i -f/dev/rmt0 \
/home/myfs
Instructor Guide
This will create a 16 MB logical volume and create a snapshot for the /home/myfs file
system on the newly created logical volume. It then mounts the snapshot logical volume
on /mntsnapshot. The remaining arguments are passed to the backup command. In
this case, the files and directories in the snapshot will be backed up by name (-i) to
/dev/rmt0.

V7.0.1
Instructor Guide

Purpose — Explain how to use an external snapshot with a backup utility.
Details —
Transition statement — Let’s compare this to working with an internal snapshot.
Instructor Guide
Using a JFS2 internal snapshot to back up

IBM Power Systems
•cd to internal snapshot and use relative path backup:

# cd /home/myfs/.snapshot/mysnap
# find . | backup –i –d /servermnt/backup52
To create snapshot and backup in one operation:

# backsnap –n snapshotname BackupOptions snappedFS
For example:
# backsnap –n mysnap -s size=16M -i -f/dev/rmt0
/home/myfs
Figure 10-23. Using a JFS2 internal snapshot to back up AN152.2
Notes:
Using an existing internal snapshot to execute a backup

For an internal snapshot, you only need to know the hidden directory name for
accessing the snapshot. Then it is simply a matter of specifying that directory name in
your backup by name execution.
Creating a internal snapshot and backup in one operation

The backsnap command provides an interface to create a snapshot for a JFS2 file
system and perform a back up of the snapshot. The command syntax for an internal
snapshot is:
For example:
# backsnap -m /mntsnapshot -s size=16M -i -f/dev/rmt0 \
/home/myfs

V7.0.1
Instructor Guide
Uempty This will create a 16 MB logical volume and create a snapshot for the /home/myfs file
system on the newly created logical volume. It then mounts the snapshot logical volume
on /mntsnapshot. The remaining arguments are passed to the backup command. In
this case, the files and directories in the snapshot will be backed up by name (-i) to
/dev/rmt0.
Instructor Guide
Instructor notes:
Purpose — Explain how to use an internal snapshot with a backup utility.
Details —
Transition statement — If you make a mistake and underestimate how quickly data is
modified or deleted, then you can have space allocation problems related to the JFS2
snapshot allocation. Let’s look at how to monitor and manage that situation.

V7.0.1
Instructor Guide
Uempty
JFS2 snapshot space management

IBM Power Systems
List snapshots for the snappedFS

# snapshot –q snappedFS
External snapshot:
The snapshot report identifies the size and amount of free space
If the snapshot needs more space:
# snapshot –o size=+1 snapshotLV
Internal snapshot:
Shares logical volume with the snappedFS
# df –m snappedFS
If snappedFS is out of space, try to free up space possibly delete old
snapshots
# snapshot –d –n snapshot_name snappedFS
Figure 10-24. JFS2 snapshot space management AN152.2
Notes:
It is useful to be able to identify situation where a snapshot is growing large. If a
snapshot runs out of space then all snapshots are invalidated and become unusable. If
dealing with an internal snapshot, the snapshots can contribute to the entire file system
running out of space.
To monitor an external snapshot, use the query option of the snapshot command. An
alternative would be to mount the snapshot and use the df command, but that is more
complicated.
If an external snapshot needs more room, you can dynamically increase the size of the
snapshot logical volume by using the size option of the snapshot command.
For an internal snapshot, there is no mechanism for identifying the space usage of the
snapshots. Instead, you monitor the size of the snappedFS.
When a file system is running out of space, one way to free space is to delete old
snapshots. Keeping many generations of snapshots can be useful, but it can also be
expensive in terms of space usage.
Instructor Guide
Instructor notes:
Purpose — Explain how to manage snapshot space allocation issues.
Details — You may choose to introduce Part 3 (JFS2 snapshots) of the matching exercise
at this point, rather than waiting until the end of the unit.
Transition statement — Let’s review what we have covered.

V7.0.1
Instructor Guide
Uempty 10.3.SAN Copy issues

What students will do — The students will identify potential AIX related problems with
using storage subsystem copy services and how use these services with AIX.
How students will do it — Through lecture and checkpoint questions
What students will learn — Students will learn how to properly work with storage
subsystem copy services in an AIX environment.
How this will help students in their job — The SAN attached storage subsystem copy
services are fairly popular, but an administrator must understand the AIX issues involved
and how to correctly use these services.
Instructor Guide
Topic 3 objectives
IBM Power Systems

Explain potential problems in using SAN Copy
Explain use of the JFS2 freeze and resume function
Explain methods for accessing SAN Copy target LUNs
Notes:

V7.0.1
Instructor Guide

Purpose — Explain objectives of the topic.
Details —
Transition statement — Let’s start with a discussion of how file system caching can
create data consistency problems when working with SAN copy.
Instructor Guide
SAN Copy and file system cache

IBM Power Systems
SAN Copy uses the storage subsystem contents

Even with quiesced application, file system updates may be
cached in memory and unwritten to the storage subsystem
Need to stop file system activity and flush memory cache
Unmounting the file system would work, but online alternative
is needed
Kernel cache Storage subsystem
Transaction X0, Y0 X0, Y0
Write X1 X1, Y0 X1, Y0
Write Y1 X1, Y1 Update only in

memory
Figure 10-26. SAN Copy and file system cache AN152.2
Notes:
Use of copy services provided by SAN attached storage subsystems is fairly common.
In this unit, we will call these services: SAN Copy. These copy services make a
point-in-time exact copy of the contents of a LUN as seen by the storage subsystem
controller. Not only can they provide a point in time copy of a LUN, but this activity does
not depend on any host system resources. On the other hand, there are potential
problems that result from only seeing the data as it resides in the storage subsystem.
Normally, when an application writes data, it receives confirmation of the write when
AIX has cached that data in memory. Later, various AIX mechanisms will flush that data
to disk storage. At the point of time that a SAN Copy is initiated, the transaction related
updates may either be in AIX kernel memory or in the storage subsystem. This provides
the possibility that the SAN Copy may have inconsistent data, even if the application
was quiesced prior to taking the snapshot.
To avoid this problem, you need to ensure that none of the related data updates are
cached in AIX memory at the time of the SAN Copy. Once again, unmounting the file
system is generally not an acceptable solution given the disruption to the application.

V7.0.1
Instructor Guide

Purpose — Explain the issue of data consistency when using SAN Copy.
Details — The course materials use the term SAN Copy in order to be agnostic to the
various storage subsystem products. Most storage subsystems has this ability, but refer to
it using different terms. The most common example in IBM systems is the flashcopy
capability of the DS8000. You might ask the students if they use SAN storage based copy
services, and if so which ones.
Other common copy services that have similar abilities and issues are: EMC Timefinder
and Hitachi ShadowImage.
Transition statement — Let us look at the tools AIX provides to help manage this
situation.
Instructor Guide
Use of JFS2 freeze and thaw

IBM Power Systems
1. JFS2 freeze will stop processing application write requests

and then will flush cached data to disk
# sync; chfs –a freeze=<timeout in sec> <FSname>
2. Use SAN Copy command for related LUNs
3. When SAN Copy completes, thawing the file system will

resume processing of application write requests
# chfs –a freeze=off <FSname>
JFS2 freeze is not needed if the quiesced application:

Uses Direct I/O or Concurrent I/O for the concerned files
Issues fsync() to flush the file data after quiescing
Figure 10-27. Use of JFS2 freeze and thaw AN152.2
Notes:
AIX provides a JFS2 file system freeze capability. It stops processing new file system
I/O requests and then flushes out all memory cached file system data to the physical
volume.
Once the application is quiesced and the file system frozen, use of a SAN Copy will
capture consistent data.
After the SAN Copy completes, we can then thaw the file system and resume
application processing.
This is only needed when the application allows AIX to cache writes and to decide when
to flush the cached data. There are two situations where the freeze mode is not needed.
- The application processes the file using Direct I/O (DIO). With DIO, writes are
synchronous and go directly to storage without any caching in kernel memory.
Concurrent I/O always uses DIO.

V7.0.1
Instructor Guide
Uempty - The application issues the synchronous fsync() system call for its output files,
forcing AIX to flush all cached data for that file and returning to the application when
that is completed.
The chfs freeze attribute requires a value which specifies a timeout period. If the file
system is not explicitly thawed (again using the chfs command) within that timeout
period, the file system will be automatically thawed. This is intended to avoid permanent
file system freezes and the timeout should be set a time period which is much longer
than you would imagine being required to process your SAN Copy.
The reason for the sync command being issued immediately prior to the freeze
request, is that for large amounts of cached data, the sync command is much more
efficient in finding and flushing that data than the freeze function. Then the freeze
function only needs to handle data that was cached immediately after that flush; should
be a small amount of data.
Instructor Guide
Instructor notes:
Purpose — Explain the use of the JFS2 freeze and thaw capability.
Details — Emphasize that this is only for JFS2 file systems.
Transition statement — While really a storage subsystem function rather than an AIX
function, consistency groups have important implications for AIX file system integrity when
later recovering using SAN Copy data. Let’s look at this.

V7.0.1
Instructor Guide
Uempty
Consistency groups
IBM Power Systems
When multiple LUNs contain related information, sequential

SAN Copy of those LUNs can result in inconsistencies
The storage subsystem should define related LUNs in the
same consistency group
SAN Copy ensures a point in time copy across all LUNs in the
consistency group
A SAN Copy of a file system log that is not consistent with the
SAN Copy of the file system data results in metadata
corruption
All file systems using a given file system journal log must be
in same consistency group as the journal log
JFS2 in-line logs, or
Dedicated logs for each file system
Figure 10-28. Consistency groups AN152.2
Notes:
While the previously discussed techniques can ensure the consistency of a
point-in-time copy of a single LUN, when multiple LUNs are interrelated we are faced
with new issues. Normally, each LUN would be SAN Copied separately and each would
be at a different point-in-time. But since they are at different points-in-time, between
them, they can have inconsistency of related data.
When the storage subsystem defines LUNs as belonging to a common consistency
group, the entire consistency group is copied at the same point-in-time. This ensures
data consistency.
Of special concern is the relationship between a file system and its journal log. If these
are on different LUNS and we do not ensure consistency, then we essentially have
metadata corruption which can make that file system and log combination unusable.
If multiple file systems share the same log and some of the those file systems are not
included in the consistency group, we again will have a situation where later access of
the log will be incompatible with the state of those other file systems. Thus, it is
Instructor Guide
recommended that for file systems which are using SAN Copy, either each file system
has its own external journal log or that they use JFS2 in-line journal logs.
If the LUN is one of many physical volumes in an entire volume group which is being
backed up, it is recommended that all of the LUNs in the volume group be included in
the same consistency group.

V7.0.1
Instructor Guide

Purpose — Explain role of consistency groups.
Details —
Transition statement — In order to use the data on a SAN copy we need to access it.
That is true even if we only wish to create file system backups from that copy. Let us look at
the procedures for accessing a SAN Copy.
Instructor Guide
Accessing SAN Copy data

IBM Power Systems
SAN Copy target LUN is an exact copy of original disk,

including VGDA, LVCB, VGID, and PVID
If using for rootvg recovery, boot from the target LUN
If accessing the entire user volume group on a different AIX
system:
Import using the importvg command
Run varyonvg, fsck and mount file systems
If accessing a user volume group on the same system:
Import using the recreatevg command
Run fsck and mount file systems
Figure 10-29. Accessing SAN Copy data AN152.2
Notes:
SAN Copy creates exact duplicates of the physical volumes, rather than a backup
image to be restored. For an AIX system to access the disk, it needs to be discovered
(zoned to that host and detected, by way of cfgmgr, by that host) and then imported into
the ODM.
If it is to act as the rootvg of that system, it must be designated as the boot device
before booting that host.
User volume groups may be accessed to either directly recover contents from the copy,
or to enable a backup utility to create a backup of the copied volume group. In either
case, the PVID on the disk (or disks) should be changed to avoid issues of duplicate
PVIDs.
If accessing the entire volume group from a system which is different from the original
system, use the importvg command on any disk in the consistency group for the
volume group, vary online, run a file system check, and mount the file systems of
interest. To avoid possible future PVID conflicts you should consider changing the PVID

V7.0.1
Instructor Guide
Uempty on the disks after importvg is completed. This can be accomplished using the chdev
command as follows:
# chdev -l hdisk# -a pv=clear
# chdev -l hdisk# -a pv=yes
When accessing from the same system (it is assumed that the original volume group
still exists) or accessing a subset of the physical volumes in the volume group, use the
recreatevg command, followed by a file system check and a mount of the file system
which are of interest. The recreatevg command has special abilities to selective
restore only the logical volumes that reside on the specified disks. The recreatevg
command now has the ability to automatically change the PVIDs. If it did not do that,
your would need to first change the PVIDs (using chdev) prior to running recreatevg.
Instructor Guide
Instructor notes:
Purpose — Discuss how to access a SAN Copy.
Details —
Transition statement — Let’s look at this recreatevg command in more detail.

V7.0.1
Instructor Guide
Uempty
The recreatevg command

IBM Power Systems
recreatevg [-f][-y vgname] [-L label prefix] [-Y LV prefix] PVs

-y Specifies the name to use for the volume group
-L Parent directory for new file system mount points
-Y Prefix for new logical volume names
-f Force creation even with missing physical volumes
Handles conflicts with still active volume group
• recreatevg –y oldvg –L /old –Y XX hdisk5 hdisk6

Original names: /dev/myfs /myfs
New names: /dev/XXmyfs /old/myfs
Figure 10-30. The recreatevg command AN152.2
Notes:
The recreatevg command is specially designed to handle the import of volume group
copies to the same system from which they were copied.
One way in which it differs from just using importvg, is the creation of a new VGID and
new PVIDs. Another major difference is that it allows you to specify prefixes to be used
when creating new file system names and logical volume names, which avoid conflicts
with the original names.
As seen in the visual, the -L option is used to create a prefix to the file system name,
which becomes a common parent directory to all of the file system mount points. The -Y
option is used to create a prefix for the logical volume names.
It is very important that you specify all disks that belong to the volume group, as
arguments to the command, when trying to access the entire volume group.
If you are trying to access a subset of physical volumes in the volume group, you may
force it to create a new volume group that only contains the specified disks and the
those logical volumes which are totally contained in those disks.
Instructor Guide
Instructor notes:
Purpose — Explain detail in the use of the recreatevg command.
Details — There is no exercise for this topic.
Transition statement — Let us review some of what we have covered with some
checkpoint questions.

V7.0.1
Instructor Guide
Uempty
Checkpoint
IBM Power Systems
1. True or false: The creation of a snapshot volume group marks

all copies in the snapshot as stale.
2. True or false: The creation of a JFS split copy marks all of the
split mirror copies as stale.
3. True or false: After the creation of a JFS split mirror copy, the
administrator needs to mount the new file system in order to
access the split copy.
4. To access a SAN Copy of an active volume group on the

source system, use the command:
a. joinvg
b. importvg
c. recreatevg
Notes:
Instructor Guide
Instructor notes:
Purpose — Review and test the students, understanding of this section.
IBM Power Systems
1. True or false: The creation of a snapshot volume group marks all copies in the
snapshot as stale.
2. True or false: The creation of a JFS split copy marks all of the split mirror copies
as stale.
The answer is true.
3. True or false: After the creation of a JFS split mirror copy, the administrator needs
to mount the new file system in order to access the split copy.
4. To access a SAN Copy of an active volume group on the source system, use the
command:
a. joinvg
b. importvg
c. recreatevg
The answer is recreatevg.

V7.0.1
Instructor Guide
Uempty
Exercises: Advanced backup techniques

IBM Power Systems
Use a snapshot volume group
Use JFS split copy (optional)
Use JFS2 snapshots
Use a file system as a recovery source
Figure 10-32. Exercises: Advanced backup techniques AN152.2
Notes:
Instructor Guide
Instructor notes:
Purpose — Introduce the lab exercise.
Details — If you had the students do the exercises immediately after each matching topic,
then there is no lab exercise at this point.

V7.0.1
Instructor Guide
Uempty
Unit summary
IBM Power Systems

Explain factors related to online backup consistency
Use JFS split mirror to back up file system data
Use a snapshot volume group to backup file system data
Use JFS2 snapshot to back up file system data
Explain AIX considerations in using SAN Copy facilities
Notes:
Instructor Guide
Instructor notes:

V7.0.1
Instructor Guide
Uempty Unit 11. Diagnostics
Estimated time
00:25

This unit is an overview of diagnostics available in AIX.

After completing this appendix, you should be able to:
• Use the diag command to diagnose hardware
• List the different diagnostic program modes

Accountability:
• Activity
References
Online AIX Version 7.1 Understanding the Diagnostic
Subsystem for AIX
Note: References listed as “online” above are available at the
following address:
http://publib.boulder.ibm.com/eserver/
© Copyright IBM Corp. 2009, 2012 Unit 11. Diagnostics 11-1

Instructor Guide
Unit objectives
IBM Power Systems
After completing this appendix, you should be able to:

Use the diag command to diagnose hardware
List the different diagnostic program modes
Notes:

V7.0.1
Instructor Guide

Purpose — Introduce the objectives of this appendix.
Details —
Additional information — This section covers very important information for support staff
aiming for AIX Certification.
Transition statement — Tell the students when they need diagnostic programs.

Instructor Guide
When do I need diagnostics?

IBM Power Systems
Diagnostics
NIM Master
CD-ROM
bos.diag
Diagnostics
Hardware error in Machine does not Strange system

error log boot behavior
Figure 11-2. When do I need diagnostics? AN152.2
Notes:
Introduction
The lifetime of hardware is limited. Broken hardware leads to hardware errors in the
error log, to systems that will not boot, or to very strange system behavior.
The diagnostic package helps you to analyze your system and discover hardware that
is broken. Additionally, the diagnostic package provides information to service
representatives that allows fast error analysis.
Sources for diagnostic programs

Diagnostics are available from different sources:
- A diagnostic package is shipped and installed with your AIX operating system.
Diagnostics are packaged into separate software packages and filesets. The base
diagnostics support is contained in the package bos.diag. The individual device
support is packaged in separate devices.[type].[deviceid] packages.

V7.0.1
Instructor Guide
Uempty The bos.diag package is split into the following filesets:

- bos.diag.rte contains the Controller and other base diagnostic code
- bos.diag.util contains the Service Aids and Tasks
- bos.diag.com contains the diagnostic libraries, kernel extensions, and
development header files
- bos.diag.ecc contains the inventory scout ECC client
- Diagnostic CD-ROMs are available that allow you to diagnose a system that does
not have AIX installed. Normally, the diagnostic CD-ROM is not shipped with the
system.
- Diagnostic programs can be loaded from a NIM master. This master holds and
maintains different resources, for example, a diagnostic package. This package
could be loaded through the network to a NIM client, that is used to diagnose the
client machine.

Instructor Guide
Instructor notes:
Purpose — Give reasons when diagnostics are used. Describe the different sources for
diagnostics.
Details —
Transition statement — Let’s discuss where to use diagnostics.

V7.0.1
Instructor Guide
Uempty
Where do I run diagnostics?

IBM Power Systems
No diag on virtual devices

VIOS CLI: diagmenu or run AIX diag under oem_setup_env
Physical
P
adapter Virtual I/O Server Client Client
S
VSCSI server
virtual adapter
VSCSI client VTD

C
virtual adapter
Virtual Virtual
VTD SCSI SCSI
VTD
Virtual target disk disk
device
P S1 S2 C C
Hypervisor
Physical hdisk VSCSI protocol
storage
Figure 11-3. Where do I run diagnostics? AN152.2
Notes:
Diagnostics are done on physical devices. It is fairly common to have totally virtualized
logical partitions which only see virtual devices: virtual Ethernet, virtual SCSI, virtual
Fibre Channel. The diag utilities will not diagnose virtual devices.
In a virtualized environment, the physical device are allocated to the virtual I/O servers
(VIOS). If a client LPAR is having problems accessing a device, the administrator needs
to identify the VIOS providing access and run the diagnostics at the VIOS.
The VIOS command line interface (CLI) equivalent of the AIX diag command is the
diagmenu command. The alternative is to create a root AIX subshell with the
oem_setup_env command and run the AIX command in that shell.

Instructor Guide
Instructor notes:
Purpose — Explain where to run diagnostics in a virtualized environment.
Details — Do not get into details of how virtualization of devices works or how to
implement them. The focus here is on where the physical devices are actually managed.
The bold circle in the visual is around the physical storage adapter to emphasize where the
diagnostic needs to be executed.
Generally, many of the techniques discussed in the course are applicable to problems with
an VIOS since it is essentially using AIX under-the-covers. If a VIOS is having problem
booting and one suspects a device problem, it can be booted to a standalone diagnostic
mode, just like an AIX logical partition.
Transition statement — Let’s discuss how to use diagnostics.

V7.0.1
Instructor Guide
Uempty
The diag command

IBM Power Systems
AIX error log
Auto diagnose Report test result
diag
• diag allows testing of a device, if it is not busy

• diag allows analyzing the error log
Figure 11-4. The diag command AN152.2
Notes:
Overview of the diag command

Whenever you detect a hardware problem, for example, a communication adapter error
in the error log, use the diag command to diagnose the hardware.
The diag command can test a device, if the device is not busy. If any AIX process is
using a device, the diagnostic programs cannot test it; they must have exclusive use of
the device to be tested. Methods used to test devices that are busy are introduced later
in this unit.
The diag command analyzes the error log to fully diagnose a problem if run in the
correct mode. It provides information that is very useful for the service representative,
for example Service Request Numbers (SRN) or probable causes.
There is a relationship between the AIX error log and diagnostics. When the errpt
command is used to display an error log entry, diagnostic results related to that entry
are also displayed.

Instructor Guide
Instructor notes:
Purpose — Introduce the diag command.
Details —
Additional information — When the diagnostic tool runs, it automatically tries to diagnose
hardware errors it finds in the error log. The information generated by the diag command is
put back into the error log entry, so that it is easy to make the connection between the error
event and, for example the FRU number required to repair failing hardware.
Transition statement — Let’s show how to work with the diag menus.

V7.0.1
Instructor Guide
Uempty
Working with diag (1 of 3)

IBM Power Systems
# diag
FUNCTION SELECTION 801002
Move cursor to selection, then press Enter.

Diagnostic Routines
This selection will test the machine hardware. Wrap plugs and
other advanced functions will not be used.
Advanced Diagnostics Routines
This selection will test the machine hardware. Wrap plugs
and other advanced functions will be used.
Task Selection (Diagnostics, Advanced Diagnostics, Service Aids)
This selection will list the tasks supported by these
procedures. Once a task is selected, a resource menu may be
presented showing all resources supported by the task.
Resource Selection
This selection will list the resources in the system that
are supported by these procedures. Once a resource is
selected, a task menu will be presented showing all tasks
that can be run on the resource(s)..
Figure 11-5. Working with diag (1 of 3) AN152.2
Notes:
Introduction to diag menus

The diag command is menu driven, and offers different ways to test hardware devices
or the complete system. One method to test hardware devices with diag is:
Start the diag command. A welcome screen appears, which is not shown on the visual.
After pressing Enter, the FUNCTION SELECTION menu is shown.
If Diagnostic Routines or Advanced Diagnostics Routines is selected, then the
Diagnostic Mode Selection menu is displayed, to determine if System Verification
or Problem Determination should be run. (More detail on these is on the next visual.)
If the Task Selection menu is selected, then the following happens:
- The Diagnostic Controller displays a list of Tasks that are available for the system.
- After a task has been selected, a Resource Selection menu appears if the
selected task supports a resource selection. After selection of a resource, the task is
called with the selected resource name as a command-line argument.

Instructor Guide
- If the selected task does not support resource selection, then the task is invoked.
If the Resource Selection menu is selected, then the following happens:
- The Diagnostic Controller displays a list of resources available on the system.
- After a resource has been selected, a Task Selection menu will appear containing
the commonly supported tasks for each selected resource. After selection of a task,
the task is invoked.

V7.0.1
Instructor Guide

Purpose — Explain the diag main menu options.
Details — Explain each menu option at a high level. Point out that, for the last two menu
items:
- The task selected under Task Selection will often provide a list of resources that
qualify for that task
- The resource selected under Resource Selection will provide a list of tasks for which
that resource is qualified
Just two different ways to identify which task to run against which resource.
The tasks list can include diagnostic routines which overlaps with the first two items in the
main menu. As is seen in the following visuals, selecting Diagnostic Routines from the
main menu will eventually present you with a list of qualified resources similar to the a
resource list that you would get from selecting a task under the Task Selection.
Transition statement — Let’s take a closer look at what happens when you select the first
two menu options.

Instructor Guide

IBM Power Systems
# diag

Diagnostic Routines
This selection will test the machine hardware. Wrap
plugs and other advanced functions will not be used.
...
DIAGNOSTIC MODE SELECTION 801003

System Verification
This selection will test the system, but will not analyze
the error log. Use this option to verify that the machine
is functioning correctly after completing a repair or an
upgrade.
Problem Determination
This selection tests the system and analyzes the error log
if one is available. Use this option when a problem is
suspected on the machine.
Notes:
Working with Diagnostic Routines

Selecting Diagnostic Routines, or Advanced Diagnostic Routines which allows you
to test hardware devices.
The next menu is DIAGNOSTIC MODE SELECTION. Here you have two selections:
• System Verification tests the hardware without analyzing the error log. This
option is used after a repair to test the new component. If a part is replaced due
to an error log analysis, the service provider must log a repair action to reset
error counters and prevent the problem from being reported again. Running
Advanced Diagnostics Routines (in the FUNCTION SELECTION menu) in
System Verification mode will log a repair action.
• Problem Determination tests hardware components and analyzes the error
log. Use this selection when you suspect a problem on a machine. Do not use

V7.0.1
Instructor Guide
Uempty this selection after you have repaired a device, unless you remove the error log
entries of the broken device.

Instructor Guide
Instructor notes:
Purpose — Explain how to work with the diag command.
Details —
Additional information — The diagnostics version number appears on the first
diagnostics screen. The version of diagnostics may become an issue in rare cases.
Normally, diagnostic versions are backwards-compatible. However, diagnostic support for
older hardware may have been dropped from the CD for a particular version of diagnostics.
In this situation, students should contact support for more information.
If Problem Determination is chosen, then the Diagnostic Controller automatically scans the
error log for any PERMANENT HARDWARE errors that have been logged within the last 7
days to determine if any devices should be automatically tested. A problem report may be
generated. It then walks the configuration database to determine which resources in the
current configuration can be tested. This information is presented in the Resource
Selection Menu.
If Advanced Diagnostics Routines is chosen, and the system is in Online Service mode
of operation, the Diagnostic Controller will display the Test Method menu to determine if
the tests should be repeated. It initializes the input parameters to the Diagnostic
Application (DA), which are contained in the Test Mode Input object class and then runs the
Diagnostic Application (DA) of the resource to be tested.
Once the DA to completes, the Diagnostic Controller then:
- Performs isolation process
- Presents conclusions to the screen
If no trouble is found, diagnostics exits with a return value of 0. Otherwise, a value of 1 is
returned if the hardware was tested bad.
Transition statement — Let’s show how to select hardware devices to test.

V7.0.1
Instructor Guide
Uempty

IBM Power Systems
DIAGNOSTIC SELECTION 801006

From the list below, select any number of resources by moving the
cursor to the resource and pressing 'Enter'.
To cancel the selection, press 'Enter' again.
To list the supported tasks for the resource highlighted, press
'List'.
Once all selections have been made, press 'Commit'.
To avoid selecting a resource, press 'Previous Menu'.
All Resources
This selection will select all the resources currently displayed.
sysplanar0 System Planar
U7311.D20.107F67B-
sisscsia0 P1-C04 PCI-XDDR Dual Channel Ultra320 SCSI
Adapter
+ hdisk2 P1-C04-T2-L8-L0 16 Bit LVD SCSI Disk Drive (73400 MB)
hdisk3 P1-C04-T2-L9-L0 16 Bit LVD SCSI Disk Drive (73400 MB)
ses0 P1-C04-T2-L15-L0 SCSI Enclosure Services Device
L2cache0 L2 Cache
...
Notes:
Selecting a device to test

In the next diag menu, select the hardware devices that you want to test. If you want to
test the complete system, select All Resources. If you want to test selected devices,
press Enter to select any device, then press F7 to commit your actions. In our example,
we select one of the disk drives.
If you press F4 (List), diag presents tasks the selected devices support, for example:
- Run diagnostics
- Run error log analysis
- Change hardware vital product data
- Display hardware vital product data
- Display resource attributes
To start diagnostics, press F7 (Commit).

Instructor Guide
Instructor notes:
Purpose — Explain how to select hardware devices.
Details —
Transition statement — What happens if a device is busy?

V7.0.1
Instructor Guide
Uempty
What happens if a device is busy?

IBM Power Systems
ADDITIONAL RESOURCES ARE REQUIRED FOR TESTING

801011
No trouble was found. However, the resource was not tested

because
the device driver indicated that the resource was in use.
The resource needed is

- hdisk2 16 Bit LVD SCSI Disk Drive
(73400 MB)
U7311.D20.107F67B-P1-C04-T2-L8-L0
To test this resource, you can do one of the following:

Free this resource and continue testing.
Shut down the system and reboot in Service mode.
Testing should stop.

The resource is now free and testing can continue.
Figure 11-8. What happens if a device is busy? AN152.2
Notes:
If the device is busy

If a device is busy, which means the device is in use, the diagnostic programs do not
permit testing the device or analyzing the error log.
The example in the visual shows that the disk drive was selected to test, but the
resource was not tested because the device was in use. To test the device, the
resource must be freed. Another diagnostic mode must be used to test this resource.

Instructor Guide
Instructor notes:
Purpose — Explain what happens if a device is busy.
Details —
Transition statement — Let’s describe the different diagnostic modes.

V7.0.1
Instructor Guide
Uempty
Diagnostic modes (1 of 3)
IBM Power Systems
Concurrent mode:
# diag
Execute diag during normal
system operation
Limited testing of components
Maintenance mode: # shutdown -m
Execute diag during single-user Password:

mode # diag
Extended testing of components
Figure 11-9. Diagnostic modes (1 of 3) AN152.2
Notes:
Diagnostic modes
Three different diagnostic modes are available:
- Concurrent mode
- Maintenance (single-user) mode
- Service (standalone) mode (covered on the next visual).
Concurrent mode
Concurrent mode provides a way to run online diagnostics on some of the system
resources while the system is running normal system activity. Certain devices can be
tested, for example, a tape device that is currently not in use, but the number of
resources that can be tested is very limited. Devices that are in use cannot be tested.

Instructor Guide
Maintenance (single-user) mode

To expand the list of devices that can be tested, one method is to take the system down
to maintenance mode by using the command, shutdown -m.
Enter the root password when prompted, and execute the diag command in the shell.
All programs, except the operating system itself, are stopped. All user volume groups
are inactive, which extends the number of devices that can be tested in this mode.

V7.0.1
Instructor Guide

Purpose — Describe diagnostic modes.
Details — In concurrent mode, because the system is running in normal operation, devices
such as the following may require additional actions by the user or diagnostic application
before testing can be done:
• SCSI adapters connected to paging devices
• Disk drives used for paging, or are part of the rootvg
• LFT devices and graphic adapters if a windowing system is active
• Memory
• Processor
Transition statement — Let’s describe the standalone mode.

Instructor Guide
IBM Power Systems
Service Insert diagnostics CD-ROM,

or ensure drive is empty
(standalone)
mode
using CD or Shut down your system
hard drive # shutdown
Start LPAR with

Boot system in service mode
boot mode of:
Diagnostic with
default bootlist
diag will be started
automatically
Notes:
Standalone mode
But what do you do if your system does not boot or if you have to test a system without
AIX installed on the system? In this case, you must use the standalone mode.
Standalone mode offers the greatest flexibility. You can test systems that do not boot or
that have no operating system installed (the latter requires a diagnostic CD-ROM).
Starting standalone diagnostics

Follow these steps to start up diagnostics in standalone mode:
1. If you have a diagnostic CD-ROM, insert it into the system.
2. Shut down the AIX system. If this is a server in the manufacturing default
configuration (single partition) and you are not using an HMC - you would next
power down the server from the operator panel. If running in a partitioned

V7.0.1
Instructor Guide
Uempty environment, the firmware should shutdown the partition after AIX reaches a halt
state.
3. Boot your AIX system. If in manufacturing default configuration, you could power
on the server from the operator panel. If in a partitioned system, you would use
the HMC to start the LPAR.
4. If starting a partition with the HMC, you would specify a boot mode of Diagnostic
with Default Bootlist. If using the manufacturing default configuration with an
attached console, see the paragraph on using the console keyboard to control
the boot mode. Either method boot the machine in service mode.
5. If the CD drive has a diagnostic CD-ROM mounted, this will start the diagnostic
program that is on that disk. If there is nothing in the CD drive, then it will boot off
the hard drive, executing the diagnostic program on that hard drive.
6. At this point, you can invoke one of the diagnostic routines.
Using keyboard to control boot mode

After the system discovers the keyboard (you will hear a beep) and before the system
begins to use a particular bootlist, you may press a key to control the mode and bootlist.
On older p Series models, the attached graphic console used functions keys (such as
F5 or F6) to signal the desired mode. In current models, the screen prompts you to use
the matching numeric keys (such as numeric 5 or 6) instead. In the following we will use
the function key terminology to refer to these.
Both F5 and F6 will cause the system to execute a service mode boot.
F5 uses the system default (non-customizable) bootlist. It lists the diskette drive,
CD drive, hard drive, and network adapter (in that order).
F6 uses the customizable service bootlist, which can be set with the bootlist
command, SMS, or the diag utility.
If the first successfully bootable device in the selected bootlist (normal, F5 or F6) is a
CD drive with a diagnostic CD loaded, the system will boot into diagnostic mode.
If you are doing a service mode boot and the first successfully bootable device in the
selected bootlist (F5 or F6) is a hard drive, then the system will boot into diagnostic
mode from that hard drive.
If the first successfully bootable device in the selected bootlist is installation media (AIX
installation CD or mksysb tape/CD), then the system will boot into Installation and
Maintenance mode.

Instructor Guide
Instructor notes:
Purpose — Describe how to start up the standalone mode.
Details —
Additional information — Standalone mode allows the greatest number of devices to be
tested. However, it does not have the ability to examine entries in the system error log.
Transition statement — Let us also see how you can boot to diagnostics mode using a
NIM server as the provider of the diagnostics routine.

V7.0.1
Instructor Guide
Uempty
IBM Power Systems
NIM diag operation on machine

Service Master
and assign SPOT
(standalone)
mode
using NIM Shut down your system
# shutdown
HMC
Boot LPAR to SMS
Network boot your LPAR
and configure for
network boot
diag will be started

automatically
Notes:
Using NIM to boot to standalone diagnostic mode

Assuming that the network adapter itself is not the problem, you can also boot to
standalone diagnostic mode doing a network boot using a NIM server.
The NIM service must first be set up with a spot resource assigned to your machine
object and then you need to apply a NIM diag operation to the machine object for your
machine to prepare NIM to serve out a diagnostics program rather than a mksysb or
BOS filesets from installation.
Next, you boot the machine to SMS, use SMS to set up the IP parameters (designating
the NIM server as the boot server) and then select the network adapter as the boot
device.

Instructor Guide
Instructor notes:
Purpose — Explain how to use NIM to bot to diagnostics.
Details —
Transition statement — Let’s look at additional tasks diag provides.

V7.0.1
Instructor Guide
Uempty
diag: Using task selection

IBM Power Systems
# diag

...
Task Selection (Diagnostics, Advanced Diagnostics, Service Aids, etc.)
This selection will list the tasks supported by these procedures.
Once a task is selected, a resource menu may be presented showing
all resources supported by the task.
...
Run Diagnostics Change Hardware Vital Product Data

Run Error Log Analysis Configure Platform Processor
Run Exercisers Diagnostics
Display or Change Diagnostic Run Delete Resource from Resource List
Time Options Disk Maintenance
Add Resource to Resource List Display Configuration and Resource
Automatic Error Log Analysis and List
Notification
Backup and Restore Media and more
Figure 11-12. diag: Using task selection AN152.2
Notes:
Additional tasks
The diag command offers a wide number of additional tasks that are hardware related.
All these tasks can be found after starting the diag main menu and selecting Task
Selection.
The tasks that are offered are hardware (or resource) related. For example, if your
system has a service processor, you will find service processor maintenance tasks,
which you do not find on machines without a service processor. On some systems, you
find tasks to maintain RAID and SSA storage systems.

Instructor Guide
Example list of tasks

Following is a list of tasks available on a Power p750 running AIX 7.1:
- Run Diagnostics
- Run Error Log Analysis
- Run Exercisers
- Display or Change Diagnostic Run Time Options
- Add Resource to Resource List
- Automatic Error Log Analysis and Notification
- Backup and Restore Media
- Change Hardware Vital Product Data
- Configure Platform Processor Diagnostics
- Delete Resource from Resource List
- Disk Maintenance
- Display Configuration and Resource List
- Display Firmware Device Node Information
- Display Hardware Error Report
- Display Hardware Vital Product Data
- Display Multipath I/O (MPIO) Device Configuration
- Display Previous Diagnostic Results
- Display Resource Attributes
- Display Service Hints
- Display Software Product Data
- Display or Change Bootlist
- Gather System Information
- Hot Plug Task
- Log Repair Action
- Microcode Tasks
- RAID Array Manager
- Update Disk Based Diagnostics

V7.0.1
Instructor Guide

Purpose — Describe the additional tasks that diag offers.
Details — Explain some typical tasks that are offered. The notes have a much fuller list of
tasks. Point out that some of these (especially the display tasks) seem trivial since they
duplicate common AIX command or SMIT panel capabilities. They are there mainly for
when you are forced to run Standalone Diagnostics due to being unable to boot the AIX
system. In that situation, you do not have a command prompt, but the diag tasks provide a
way to obtain that same information.
Additional information — All newer PCI models support the diag command.
Transition statement — The diagnostic output is saved to a binary file so it can be
referenced later. Let’s take a look at that.

Instructor Guide
Diagnostic log
IBM Power Systems
# /usr/lpp/diagnostics/bin/diagrpt -r
ID DATE/TIME T RESOURCE_NAME DESCRIPTION
DC00 Mon Oct 08 16:13:06 I diag Diagnostic Session was started
DAE0 Mon Oct 08 16:10:38 N hdisk2 The device could not be tested
DA00 Mon Oct 08 16:05:11 N sysplanar0 No Trouble Found
DA00 Mon Oct 08 16:05:05 N sisscsia0 No Trouble Found
# /usr/lpp/diagnostics/bin/diagrpt -a
IDENTIFIER: DC00
Date/Time: Mon Oct 08 16:13:06
Sequence Number: 15
Event type: Informational Message
Resource Name: diag
Diag Session: 327726
Description: Diagnostic Session was started.
----------------------------------------------------------------------------
IDENTIFIER: DAE0
Date/Time: Mon Oct 08 16:10:38
Sequence Number: 14
Event type: Error Condition
Resource Name: hdisk2
Resource Description: 16 Bit LVD SCSI Disk Drive
Location: U7311.D20.107F67B-P1-C04-T2-L8-L0
Figure 11-13. Diagnostic log AN152.2
Notes:
Diagnostic log
When diagnostics are run in online or single user mode, the information is stored into a
diagnostic log. The binary file is called /var/adm/ras/diag_log. The command,
/usr/lpp/diagnostics/bin/diagrpt, is used to read the content of this file.
Report fields
The ID column identifies the event that was logged. In the example in the visual, DC00
and DA00 are shown. DC00 indicated the diagnostics session was started and the DA00
indicates No Trouble Found (NTF).
The T column indicates the type of entry in the log. I is for informational messages. N is
for No Trouble Found. S shows the Service Request Number (SRN) for the error that
was found. E is for an Error Condition.

V7.0.1
Instructor Guide

Purpose — Show the contents of a diagnostics log.
Details — Review the visual content for the diagnostics log. The student notes explain the
ID and Types that are displayed.
Additional information — The IDs that currently exist are:
DC00 - Diagnostic controller session started
DCF0 - Diagnostic controller reported an SRN from missing options
DCF1 - Diagnostic controller reported an SRN from new resource
DCE1 - Diagnostic controller reported ERROR_OTHER
DA00 - Diagnostic application reported NTF (No Trouble Found)
DAF0 - Diagnostic application reported an SRN
DAFE - Diagnostic application reported an ELA (Error Log Analysis) SRN
DAE0 - Diagnostic application reported ERROR_OPEN
DAE1 - Diagnostic application reported ERROR_OTHER
Transition statement — Let’s answer some checkpoint questions.

Instructor Guide
Checkpoint
IBM Power Systems
1. What diagnostic modes are available?

a. Concurrent
b. Maintenance
c. Service (standalone)
d. All of the above
2. How can you diagnose a communication adapter that is

used during normal system operation?
Notes:

V7.0.1
Instructor Guide

Purpose — Review and test the students, understanding of this unit.
IBM Power Systems

a. Concurrent
b. Maintenance
d. All of the above
The answer is all of the above.

The answer is use either maintenance or service mode.

Instructor Guide
Exercise: Diagnostics
IBM Power Systems
Run diagnostics in multi-user mode
Run diagnostics in single user mode
Run diagnostics in service mode from a disk
Boot to diagnostics using an external boot image

(NIM server)
Figure 11-15. Exercise: Diagnostics AN152.2
Notes:

V7.0.1
Instructor Guide

Purpose — Explain the goals of the lab.
Details — Clearly explain what students have to do.
Additional information — This exercise should be performed only by one person per
system.

Instructor Guide
Unit summary
IBM Power Systems
Having completed this appendix, you should be able to:

Use the diag command to diagnose hardware
List the different diagnostic program modes
Notes:

V7.0.1
Instructor Guide


Instructor Guide

V7.0.1
Instructor Guide
Uempty Unit 12. The AIX system dump facility
Estimated time
01:25
00:30 Wrap up / Evaluations

This unit explains how to maintain the AIX system dump facility and
how to obtain a system dump.

• Explain what is meant by a system dump
• Determine and change the primary and secondary dump devices
• Create a system dump
• Execute the snap command
• Use the kdb command to check a system dump

Accountability:
• Lab exercise
References
Online AIX Version 7.1 Kernel Extensions and Device Support
Programming Concepts (Chapter 16. Debug Facilities)
Online AIX Version 7.1 Operating system and device management
(section on System Startup)
Note: References listed as “online” above are available at the
following address:
http://publib.boulder.ibm.com/infocenter/systems
© Copyright IBM Corp. 2009, 2012 Unit 12. The AIX system dump facility 12-1
Instructor Guide
Unit objectives
IBM Power Systems

Explain what is meant by a system dump
Determine and change the primary and secondary dump
devices
Create a system dump
Execute the snap command
Use the kdb command to check a system dump
Notes:
Importance of this unit

If an AIX kernel (the major component of your operating system) crashes, routines used
to create a system dump are invoked. This dump can be used to analyze the cause of
the system crash.
As an administrator, you have to know what a dump is, how the AIX dump facility is
maintained, and how a dump can be obtained.
You also need to know how to use the snap command to package the dump before
sending it to IBM.

V7.0.1
Instructor Guide

Purpose — Present the objectives for this unit.
Details — Use the information in the student notes to emphasize the importance of this
unit.
Also, be sure to set expectations regarding this unit: The purpose of this unit is to show the
students how to maintain/configure the system dump facility and obtain a system dump.
We are not trying to teach the students anything about analyzing a dump in this unit.
Transition statement — So, what is a system dump?
Instructor Guide
System dumps
IBM Power Systems
What is a system dump?
What is a system dump used for?
Figure 12-2. System dumps AN152.2
Notes:
What is a system dump?

A system dump is a snapshot of the operating system state at the time of a crash or a
manually-initiated dump. When a manually-initiated or unexpected system halt occurs,
the system dump facility automatically copies selected areas of kernel data to the
primary (or secondary) dump device. These areas include kernel memory, as well as
other areas registered in a structure called the Master Dump Table by kernel modules or
kernel extensions.
What is a system dump used for?

The system dump facility provides a mechanism to capture sufficient information about
the AIX kernel for later expert analysis. Once the preserved image is written to disk, the
system will be booted and returned to production. The dump is then typically submitted
to IBM for analysis.

V7.0.1
Instructor Guide

Purpose — Present an overview of system dumps.
Details — We will talk more about the primary and secondary dump devices (mentioned in
the student notes on the current page) later.
Transition statement — Let’s look at the different types of dumps which are available in
AIX.
Instructor Guide
Types of dumps
IBM Power Systems
Traditional:
AIX generates dump prior to halt
Firmware assisted (fw-assist):
POWER firmware generates dump in parallel with AIX halt process
Defaults to same scope of memory as traditional
Can request a full system dump
Live dump facility:
Selective dump of registered components without need for a system
restart
Can be initiated by software or by operator
Controlled by livedumpstart and dumpctrl
Written to a file system rather than a dump device
Figure 12-3. Types of dumps AN152.2
Notes:
Overview
In addition to the traditional dump function, AIX 6 introduced two new types of dumps.
Traditional dumps
Traditionally, AIX alone handled system dump generation and the only way to get a
dump was to halt the system either due to a crash or through operator request. In a
logical partition it will only dump the memory that is allocated to that partition.
Firmware assisted dumps (fw-assist)

With AIX 6.1 (or later) and POWER6 (or later) hardware, you can configure the dump
facility to have the firmware of the hardware platform handle the dump generation. The
main advantage to this is that the operating system can start its reboot while the
firmware handles the dumping of the memory contents.

V7.0.1
Instructor Guide
Uempty In its default mode, it will capture the same scope of memory as the traditional dump,
but it can be configured for a full memory dump.
If, for some reason (such as memory restrictions), a configured or requested firmware
assisted dump is not possible, then the traditional dump facility will be invoked.
More details on the configuration and initiation of firmware assisted dumps will be
covered later in the context of the sysdumpdev and sysdumpstart commands.
Live dump facility

AIX 6.1 also introduced a new live dump capability. If a system component is designed
to use this facility, a restricted scope dump of the related memory can be captured
without the need to halt the system.
If an individual component is having problems (such as being hung), a
livedumpstart command may be run to dump the needed diagnostic information.
The management of live dumps (such as enabling a component or controlling the dump
directory) is handled with the dumpctrl command.
The use and management of live dumps require a knowledge of system components
which is beyond the scope of this class. Only use these commands under the direction
of the AIX Support Line personnel.
Instructor Guide
Instructor notes:
Purpose — Explain the different types of dumps
Details — This is only an overview of the dump types. Do not go into much detail here.
There are two main reasons for introducing these dump types. First, they will likely hear
them referred to and this will help clarify what these are about. Second, they will see
references to the firmware assisted dumps when we look at the smit panels and line
commands for dump management, later in the unit.
Note that there is later visual that discusses firmware assisted dumps in more detail.
Transition statement — Let’s look at ways a system dump might be created.

V7.0.1
Instructor Guide
Uempty
How a system dump is invoked

IBM Power Systems
Copies kernel data structure

to a dump device
Using At Using
keyboard or unexpected command
reset button system halt
Using remote
reboot facility
Using HMC
Using SMIT
reset - dump
Figure 12-4. How a system dump is invoked AN152.2
Notes:
Creating a system dump

A system dump might be created in one of several ways:
- An AIX system will generate a system dump automatically when a sufficiently severe
system error is encountered.
- A set of special keys on the Low Function Terminal (LFT) graphics console keyboard
can invoke a system dump when your machine's mode switch is set to the Service
position or the Always Allow System Dump option is set to true.
- On systems running versions of AIX 5L prior to AIX 5L V5.3, a dump can also be
invoked when the Reset button is pressed when your machine's mode switch is set
to the Service position or the Always Allow System Dump option is set to true. In
AIX 5L V5.3 and later, the system will always dump when the Reset button is
pressed, providing the dump device is non-removable.
Instructor Guide
- For logical partitions running AIX, the HMC can issue a restart with dump request
which is the functional equivalent of the previously described reset button triggered
dump.
- The superuser can issue a command directly, or through SMIT, to invoke a system
dump.
- The remote reboot facility can also be used to create a system dump.
Analysis of system dump

Usually for persistent problems, the raw dump data is placed on a portable media, such
as tape, and sent to AIX Support for analysis.
The raw dump data can be formatted into readable output through the kdb command.
The sysdumpdev command

The default system dump configuration of the system can be altered with the
sysdumpdev command. For example, using this command, you can configure system
dumps to occur regardless of key mode switch position, which is handy for PCI-bus
systems, as they do not have a key mode switch.
System dumps in an LPAR environment

In an LPAR environment, a dump can be initiated from the Hardware Management
Console (HMC). We will discuss this point in more detail later in this unit.

V7.0.1
Instructor Guide

Purpose — Explain ways to obtain a system dump.
Details — Your system generates a system dump when a severe error occurs. System
dumps can also be user-initiated by users with root authority. A system dump creates a
picture of your system's memory contents. System administrators and programmers can
generate a dump and analyze its contents when debugging new applications.
When the system halts with a flashing 888 followed by a 102, then you know that a system
dump has occurred. When this happens, you must turn your system off and back on in
order to get your system back again.
You might want to recap the different uses of the Reset button on the classical RS/6000
that we have considered so far in this course:
1. Reboot the system (in normal or service mode).
2. Formulate the Service Request Number (SRN) and the location code (in normal mode).
3. Cause a dump (in service mode).
Keep in mind that there is no system key on PCI-based systems, so (prior to AIX 5L V5.3)
you had to use the -K option of the sysdumpdev command (or use SMIT) to set Always
Allow System Dump to true in order to enable use of the Reset button to force a dump on
such systems.
We will discuss how a dump can be initiated using the other methods listed later in this unit.
Additional information — In an LPAR environment, a dump can be initiated from the
Hardware Management Console (HMC) by choosing Dump from the Restart Options
when using the Restart Partition menu selection in the Server Management application.
The Dump option is equivalent of pressing the Reset button on an Sserver non-LPAR
system. The partition will initiate a system dump to the primary dump device if configured to
do that. Otherwise, the partition will simply reboot.
Transition statement — Let’s see what happens when a dump occurs. It may be the
result of a system crash.
Instructor Guide
Crash code: 888

IBM Power Systems
888
code
Software Hardware
Reset
102 103
Yes
Reset for Reset twice
crash code for SRN
yyy-zzz
Reset for
dump code Reset once
for FRU Optional
codes for
hardware
Reset eight times failure
for location code
Figure 12-5. Crash code: 888 AN152.2
Notes:
What is the 888 code?

One type of error you may encounter is a crash progress code of 888. When displayed
on an LED of the physical operator panel display, the 888 will often be flashing on and
off. So you will hear this referred to as a flashing 888, even though an HMC does not
flash the number.
An 888 code indicates that you have lost your system and that additional information is
available as a series of display values. Either a hardware or software problem has been
detected and a diagnostic message is ready to be read. The 888 is only the first in a
series of codes. On some displays the entire series will be shown as a dash delimited
sequence. For a server running a single AIX OS, the operator panel LED may only
show one code at a time; a series of resets will walk through the sequence of code
values. Record, in sequence, every code displayed after the 888.

V7.0.1
Instructor Guide
Uempty On systems with no HMC and a three-digit or a four-digit operator panel, you may need
to press the system’s reset button to view the additional digits after the 888. Once the
series cycles back to showing 888, the sequence is over.
102 code
A 102 code indicates that a system dump has occurred; your AIX kernel crashed due to
bad circumstances. You may need to press the reset button to obtain the crash code
and then the dump code.
103 code
A 103 code usually indicates a hardware error. In an HMC managed LPAR
environment, hardware errors are reported through the service focal point of the HMC;
thus, you should not expect to see an 888-103 sequence for in an LPAR reference code
field on the HMC. Working with the HMC facilities is covered in the LPAR training.
If you do have an 888-103 sequence, pressing the reset button twice will get a Service
Request Number, which may be used by IBM support to analyze the problem.
In case of a hardware failure, additional resets would retrieve the sequence number of
the Field Replaceable Unit (FRU) and a location code. The location code identifies the
physical location of a device.
Instructor Guide
Instructor notes:
Purpose — Introduce what an 888 display code means.
Details — Describe what students have to do when an 888 display occurs. Emphasize
that, in an HMC managed LPAR environment, they should only see the 888-102 sequence.
The focus here is on crashes which result in dumps (the left side of the diagram).
Transition statement — Whether an unintended system crash or an administrator
requested dump, where is the dump stored and how do we access it?

V7.0.1
Instructor Guide
Uempty
When a dump occurs

IBM Power Systems
AIX Kernel Crash
hd6
/dev/hd6 Primary dump device
Next boot:
Copy dump into ...
/var/adm/ras/vmcore.0 Copy directory
Figure 12-6. When a dump occurs AN152.2
Notes:
Primary dump device

If an AIX kernel crash occurs (system-initiated or user-initiated), kernel data is written to
the primary dump device, which is, by default, /dev/hd6, the primary paging device.
Note that, after a kernel crash, AIX may need to be rebooted. If the autorestart
system attribute is set to TRUE, the system will automatically reboot after a crash.
The copy directory

During the next boot, the dump is copied (remember: rc.boot 2) into a dump directory;
the default is /var/adm/ras. The dump file name is vmcore.x, where x indicates the
number of the dump (for example, 0 indicates the first dump).
Instructor Guide
Instructor notes:
Purpose — Describe what happens if a dump occurs.
Details — Base your presentation on the material in the student notes.
Transition statement — Let’s find out where all this information is written and how you
can customize this.

V7.0.1
Instructor Guide
Uempty
The sysdumpdev command

IBM Power Systems
# sysdumpdev -l List dump values

primary /dev/hd6
secondary /dev/sysdumpnull
copy directory /var/adm/ras
forced copy flag TRUE
always allow dump FALSE
dump compression ON
type of dump traditional
# sysdumpdev -p /dev/sysdumpnull Deactivate primary dump device

(temporary)
# sysdumpdev -P -s /dev/rmt0 Change secondary dump device

(Permanent)
# sysdumpdev -L Display information about last dump

Device name: /dev/hd6
Major device number: 10
Minor device number: 2
Size: 9507840 bytes
Date/Time: Tue Oct 5 20:41:56 PDT 2007
Dump status: 0
Figure 12-7. The sysdumpdev command AN152.2
Notes:
Primary and secondary dump devices

There are two system dump devices:
- Primary - Usually used when you wish to save the dump data
- Secondary - An alternate dump device; often used to discard dump data (using
/dev/sysdumpnull)
Use the sysdumpdev command or SMIT to query or change the primary and secondary
dump devices.
Ensure you know your system and know what your primary and secondary dump
devices are set to. Your dump device can be a portable medium, such as a tape drive.
AIX will use /dev/hd6 (paging) as the default primary dump device, except when there
is more than 4 GB of allocated memory.
Instructor Guide
Flags for sysdumpdev command

Flags for the sysdumpdev command include the following:
-l Lists current values of dump-related settings.
-e Estimates the size of a dump.
-p Specifies primary dump device.
-C Turns on compression (default in AIX 5L V5.3 and not an option in
AIX 6.1 or later, where dumps are always compressed).
-c Turns off compression (not an option in AIX 6.1 or later).
-s Specifies secondary dump device.
-P Makes the change of primary or secondary dump devices
permanent.
-d directory Specifies the directory the dump is copied to at system boot. If the
copy fails at boot time, the -d flag indicates that the system dump
should be ignored (force copy flag = FALSE)
-D directory Specifies the directory the dump is copied to at system boot. If the
copy fails at boot time, using the -D flag allows you to copy the
dump to external media (force copy flag = TRUE).
-K If your machine has a key mode switch, the reset button or the dump
key sequences will force a dump with the key in the normal position,
or on a machine without a key mode switch. Note: On a machine
without a key mode switch, a dump cannot be forced with the key
sequence without this value set. This is also true of the reset button
prior to AIX 5.3.
-f { disallow | allow | require }
Specifies whether the firmware-assisted full memory system dump
is allowed, required, or not allowed. The -f flag has the following
variables:
- The disallow variable specifies that the full memory system
dump mode is not allowed (it is the selective memory mode).
- The allow variable specifies that the full memory system dump
mode is allowed but is performed only when the operating
system cannot properly handle the dump request.
- The require variable specifies that the full memory system
dump mode is allowed and is always performed.
-t { traditional | fw-assisted }
Specifies the type of dump to perform. The -t flag has the following
variables:

V7.0.1
Instructor Guide
Uempty - The traditional variable specifies performing a traditional

system dump. In this dump type, the dump data is saved before
system reboot.
- The fw-assisted variable specifies performing a
firmware-assisted system dump. In this dump type, the dump
data is saved in parallel with the system reboot.
You can use the firmware-assisted system dump only on PHYP
platforms with various restrictions on memory size. When the
fw-assisted system dump type is not allowed at configuration time,
or is not enforced at dump request time, a traditional system dump is
performed. In addition, because the scratch area is only reserved at
initialization, a configuration change from traditional system dump to
firmware-assisted system dump is not effective before the system is
rebooted.
-z Writes to standard output the string containing the size of the dump
in bytes and the name of the dump device, if a new dump is present.
Dump status values

Status values, as reported by sysdumpdev -L, correspond to dump LED codes (listed in
full later) as follows:
0 = 0c0 Dump completed
-1 = 0c8 No primary dump device
-2 = 0c4 Partial dump
-3 = 0c5 Dump failed to start
Note: If the value of Dump status is -3, Size usually shows as 0, even if some data
was written.
Examples on visual
The examples on the visual illustrate use of several of the sysdumpdev flags discussed
in the preceding material.
Dump information in the error log

System dumps are usually recorded in the error log with the DUMP_STATS label. Here,
the Detail Data section will contain the information that is normally given by the
sysdumpdev -L command: the major device number, minor device number, size of the
dump in bytes, time at which the dump occurred, dump type, that is, primary or
secondary, and the dump status code.
Instructor Guide
DVD support for system dumps (AIX 5L V5.3 and later)

AIX 5L V5.3 added the ability to send the system dump to DVD media. The DVD device
could be used as a primary or secondary dump device. In order to get this functionality,
the target DVD device should be DVD-RAM or writable DVD. Remember to insert an
empty writable DVD in the drive when using the sysdumpdev command, or when you
require the dump to be copied to the DVD at boot time after a crash. If the DVD media is
not present, the commands will give error messages or will not recognize the device as
suitable for system dump copy.
Display of extra dump information on TTY (AIX 5L V5.3 and later)

During the creation of the system dump, AIX 5L V5.3 or later displays additional
information on the console TTY about the progress of the system dump, as illustrated in
the following sample output:
# sysdumpstart -p
Preparing for AIX System Dump . . .
Dump Started .. Please wait for completion message
AIX Dump .. 23330816 bytes written - time elapsed is 47 secs
Dump Complete .. type=4, status=0x0, dump size:23356416 bytes
Rebooting . . .
At this time, the kernel debugger and the 32-bit kernel need to be enabled to see this
function, and the functionality has been checked only on the S1 port. However, this
limitation may change in the future.
Verbose flag for sysdumpdev (AIX 5L V5.3 and later)

Following a system crash, scenarios exist where a system dump may crash or fail
without one byte of data written out to the dump device, for example, power off or disk
errors. For cases where a failed dump does not include the dump minimal table, it is
very useful to save some trace back information in the NVRAM. Starting with AIX 5L
V5.3, the dump procedure is enhanced to use the NVRAM to store minimal dump
information. In the case where the dump fails, we can use the sysdumpdev -vL
command (-v is the new verbose flag) to check the reason for the failure.

V7.0.1
Instructor Guide

Purpose — Discuss the sysdumpdev command and its various options.
Details — When you install the operating system, the dump device is automatically
configured for you. By default the primary device is /dev/hd6, which is a paging logical
volume, and the secondary device is /dev/sysdumpnull.
If a dump occurs to paging, the system will automatically copy the dump when the system
is rebooted. By default, the dump gets copied to the /var/adm/ras directory. We will look at
this in detail later in this unit.
The recommended size for the dump device is at least a quarter of the size of real memory.
In problem situations where the current dump device does not meet this recommendation,
it is advisable to create a temporary dump logical volume of the size required and manually
recreate the environment in which a previous dump occurred. If the dump device is not
large enough, the system will produce a partial dump only. It is possible, but extremely
unlikely, that a support center can determine the cause of the crash from a partial dump.
The -e flag can be used as a starting point to determine how big the dump device should
be.
Discussion Items - What is the advantage of having two dump areas?
Answer: For a backup media.
Transition statement — Let’s look at how what the default dump method will be and how
you can use sysdumpdev to control that.
Instructor Guide
Firmware assisted dump

IBM Power Systems
Improved reliability over traditional dump

Ability to capture is not dependant on the crashing OS
At reboot, new kernel dumps data from previous crashed kernel
Starting with AIX 7.1, firmware assisted dump (FWAD) is now the
default system dump type if:
Platform is POWER6 and above
Memory is at least 1.5 GB
Dump logical volume is in rootvg and not the hd6 logical volume
User can reconfigure to use traditional dump instead:

# sysdumpdev –t traditional
User can request a full memory dump in case the default (selective
memory) dump is not providing enough data:
# sysdumpdev –f require
Diskless thin servers can dump to remote iSCSI disks using FWAD
Figure 12-8. Firmware assisted dump AN152.2
Notes:
With POWER6 and later processor based systems, system dumps can be assisted by
firmware. Firmware-assisted system dumps are different from traditional system dumps
that are generated before a system partition is reinitialized because they take place
when the partition is restarting.
In fw-assist mode, in order to improve fault tolerance and performance, disk writing
operations are done as much as possible during the AIX Boot phase in parallel with the
AIX initialization.
A firmware-assisted system dump takes place under the following conditions:
- The firmware assisted dump is supported only on POWER6-based servers and
later
- The memory size at system startup is equal to or greater than 4 GB
- The system has not been configured to do a traditional system dump
- Physical partition size of 16 MB

V7.0.1
Instructor Guide
Uempty The system administrator can reconfigure to use traditional dump instead:
# sysdumpdev –t traditional
The system administrator can request a full memory dump in case the default (selective
memory) dump is not providing enough data:
# sysdumpdev –f require
Diskless thin servers can dump to remote iSCSI disks using FWAD.
Instructor Guide
Instructor notes:
Purpose — Point out the AIX 7.1 changes for firmware assisted dumps.
Details —
Additional information — The main steps of a firmware-assisted system dump are (from
the IBM AIX Version 6.1 Differences Guide):
1. When all conditions for a firmware-assisted dump are validated (at system initialization),
AIX reserves a dedicated memory scratch area.
2. This predefined scratch area is not released unless the system administrator explicitly
configures a legacy dump configuration.
3. The predefined scratch area size is relative to the memory size and ensures AIX will be
able to reboot while the firmware-assisted dump is in progress. Note: As dump data is
written at the next restart of the system, the AIX dump tables that are used to refer the
data cannot be preserved.
4. System administrators must be aware that this dedicated scratch area is not adjusted
when a memory DR operation modifies the memory size. A verification can be run with
the sysdumpdev command by system administrators in order to be notified if the
firmware-assisted system dump is still supported.
5. AIX determines the memory blocks that contain dump data and notifies the dedicated
hypervisor to start a firmware-assisted dump with this information.
6. The hypervisor logically powers the partition off, but preserves partition memory
contents.
7. The hypervisor copies just enough memory to the predefined scratch area so that the
boot process can start without overwriting any dump data.
8. The AIX boot loader reads this dedicated area and copies it onto disk using dedicated
open firmware methods. The hypervisor has no authority and is unable by design to
write onto disk for security reasons.
9. AIX starts to boot and in parallel copies preserved memory blocks. The preserved
memory blocks are blocks that contain dump data not already copied by the AIX boot
loader. As with the traditional dump, a firmware-assisted dump uses only the first copy
of rootvg as the dump device; it does not support disk mirroring. [course developer’s
note: this last statement is in contradiction to other research on the disk mirroring
support]
10. The dump is complete when all dump data is copied onto disk. The preserved memory
then returns to AIX usage.
11. AIX waits until all the preserved memory is returned to its partition usage in order to
launch any user applications.
Transition statement — Let look at situations where a dedicated logical volume will be
used as the dump device.

V7.0.1
Instructor Guide
Uempty
Dedicated dump device (1 of 3)

IBM Power Systems
By default, servers with real memory less than 4 GB will have

a dedicated dump device created at installation time
System memory size Dump device size
4 GB to, but not including, 12 GB 1 GB
12, but not including, 24 GB 2 GB
24, but not including, 48 GB 3 GB
48 GB and up 4 GB
Figure 12-9. Dedicated dump device (1 of 3) AN152.2
Notes:
Creation of dedicated dump device

Servers with more than 4 GB of real memory will have a dedicated dump device created
at installation time. This dedicated dump device is automatically created; no user
intervention is required.
The default size of the dedicated dump device depends on the amount of memory.
- From 4 GB up to, but not including, 12 GB, the dump device size will be 1GB.
- From 12 GB up to, but not including, 24 GB, the dump device size will be 2 GB.
- From 24 GB up to, but not including, 48 GB, the dump device size will be 3 GB.
- Memory greater than 48 GB results ina dump device size of 4 GB.
Default name of dedicated dump device

The default name of the dump device logical volume is lg_dumplv.
Instructor Guide
Instructor notes:
Purpose — Explain that a dedicated dump device is created for systems with more than
4 GB of main memory.
Details — Point out that the size of the dedicated dump device depends on the amount of
physical memory on this system and mention the default name of the dedicated dump
device.
Transition statement — You can specify the name and size of the dedicated dump device
instead of using the defaults we have just discussed.

V7.0.1
Instructor Guide
Uempty

IBM Power Systems
Specifying a dedicated dump device during installation:
/bosinst.data
...
control_flow:
CONSOLE = /dev/vty0
...
large_dumplv:
DUMPDEVICE = /dev/lg_dumplv
SIZEGB = 1
Notes:
The bosinst.data file

The bosinst.data file contains stanzas which direct the actions of the Base Operating
System (BOS) install program. After an initial installation, you can change many
aspects of the default behavior of the BOS install program by editing the bosinst.data
file and using it (for example, on a supplementary diskette) with your installation media.
The large_dumplv stanza

The optional large_dumplv stanza in bosinst.data can be used to specify
characteristics to be used if a dedicated dump device is created. A dedicated dump
device is only created for systems with 4 GB or more of memory.
The following characteristics can be specified in the large_dumplv stanza:
- DUMPDEVICE: Specifies the name of the dedicated dump device
- SIZEGB: Specifies the size of the dedicated dump device in gigabytes
Instructor Guide
If the stanza is not present, the dedicated dump device is created when required, using
the default values previously discussed.

V7.0.1
Instructor Guide

Purpose — Describe how the bosinst.data file can be used to specify the name and size
of a dedicated dump device.
Details —
Transition statement — It is important to determine the estimated size of a system dump
for your machine.
Instructor Guide

IBM Power Systems
Creating a dedicated dump device after installation:
Define a rootvg logical volume of type sysdump:

# mklv –y ded_dumplv –t sysdump rootvg 64
Change the primary dump device to be the new logical

volume:
# sysdumpdev –P -p /dev/ded_dumplv
Validate that your new device is now the primary dump

device:
# sysdumpdev –l
Note: Use of a mirrored dedicated dump logical volume is

supported but not recommended due to performance issues
Notes:
The last method for using a dedicated dump logical volume is to manually configure it
after system installation. The procedure shown is the visual is fairly straightforward. The
main concern is the usual one of need the allocation to be large enough to handle the
dump.
A common question concerns the use of the mirrorvg command. If you are at a
currently supported release of AIX (with current maintenance) dumping to an LVM
mirrored logical volume is supported, but the dump processing takes much longer when
using LVM mirroring. It is recommended that you do not mirror the dump logical volume.
The mirrorvg command will not mirror a dump logical volume in the rootvg unless it is
the paging space.

V7.0.1
Instructor Guide

Purpose — Explain how to configure a dedicated dump logical volume after installation.
Details —
It is good to have a separate logical volume for the dump, instead of using the paging
space logical volume. By having a separate dump logical volume, it separates the dump
logical volume issues from the paging space issues. The bottom line is that while it is
definitely good to mirror the paging space logical volume, it is not recommended to do so
for the dump logical volume.
Prior to AIX4.3.3 there was no support for mirroring of the dump logical volume. The
problem was that, while the dump was properly created on the primary copy, when reading
the dump logical volume, some logical partitions were read from the primary copy and
some read from the secondary copy, resulting in garbage.
In AIX4.3.3, the intermediate command, readlvcopy, was provided that allowed one to
specify to only read from the primary copy even though the policy was parallel. Also, dump
processing (snap reading of dump logical volume) was recoded to use readlvcopy. At that
point mirroring of a dump logical volume could be supported.
But there was another problem: sometimes with a mirrored dump logical volume, the dump
would not be reported. This was fixed in AIX 5.2 TL08 (or later) and in AIX 5.3 TL04 (or
later).
Thus, a mirrored dump logical volume is currently supported. This is reflected in the use of
the paging space as the default dump device (unless large amounts of memory) and that
the mirrorvg command automatically mirrors the paging space, even when it is also acting
as the dump logical volume.
On the other hand, mirroring the dump logical volume is not recommended, due to the
resulting performance impact when creating the dump and some complications in reading
the dump.
Because of this “recommendation”, the mirrorvg command will not mirror the dump logical
volume if the there is a separate logical volume for the dump (not using the paging space).
If the mirroring of the dump logical volume is not recommended, then how should we
protect against the scenario of the disk holding the dump logical volume being unavailable
at the time of the dump? The recommendation is to define a secondary dump device on a
different disk than the primary dump device.
Transition statement — As stated, have the correct space allocation is important in dump
processing. Let us look at how to identify the correct amount of disk space.
Instructor Guide
Estimating dump size

IBM Power Systems
# sysdumpdev -e Estimate dump size

0453-041 estimated dump size in bytes: 52428800
# sysdumpdev -C Turn on dump compression

(In AIX 6.1 and later, dumps are
always compressed)
# sysdumpdev -e
0453-041 estimated dump size in bytes: 10485760
Use this information to size the /var file system.
Figure 12-12. Estimating dump size AN152.2
Notes:
Sizing the /var file system

You should size the /var file system so that there is enough free space to hold the dump
information should your machine ever crash.
Estimating the space needed to hold a system dump

The sysdumpdev -e command will provide an estimate of the amount of disk space
needed for system dump information. The size of the dump device and of the copy
directory you will require are directly related to the amount of RAM on your machine.
The more RAM on the machine, the more space that will be needed on the disk.
Machines with 16 GB of RAM may need 2 GB of dump space.

V7.0.1
Instructor Guide
Uempty Dump compression

In AIX V4.3.2, an option was added to compress the dump data before it is written.
Dump compression is on by default in AIX 5L V5.3 and later.
To turn on dump compression, enter sysdumpdev -C. This will significantly reduce the
amount of space needed for dump information.
To turn off compression, enter sysdumpdev -c.
Starting with AIX 6.1, dumps are always compressed; thus the -C and -c flags to
control compression are no longer valid options of the sysdumpdev command.
Instructor Guide
Instructor notes:
Purpose — Show how to estimate the disk space needed for a system dump.
Details — The command sysdumpdev -e will estimate the dump size. It is just an estimate.
To be safe, the disk space should be larger than the estimate. Also, if the system has
dumped in the past, looking at the size of the past dump can provide more guidance on
sizing the dump device. This can be seen using the command sysdumpdev -L (mentioned
earlier in the unit).
In AIX V4.3.2, the ability to compress the dump was introduced. Turning on dump
compression will reduce the space needed significantly. Dump compression is on by
default starting in AIX 5L V5.3. Dumps are always compressed in AIX 6.1 and later.
You should mention a few other points about dump devices:
• If a paging device (like hd6) is used for dumps, it must be part of rootvg.
• The primary dump device must always be in the rootvg.
• The secondary dump device may be outside rootvg as long as it is not a paging device.
• Prior to 4.3.3, dump devices should not be mirrored. The dump information was written
to only one mirror and the mirror was not marked stale. When rebooting, the information
in the dump device would write the data to the dump file using both copies of the mirror
even though only one mirror had the correct information. This created a corrupted dump
file. In 4.3.3, this was corrected by allowing the dump file to be read only from the good
copy.
• AIX at V5.3 and later allows a DVD device to be used as a primary or secondary dump
device.
Transition statement — Let’s look at a new feature in AIX 5L that checks dump space
sizes.

V7.0.1
Instructor Guide
Uempty
The dumpcheck utility

IBM Power Systems
The dumpcheck utility will do the following when enabled:

Estimate dump or compressed dump size using sysdumpdev –e
Find dump logical volumes and copy directory with sysdumpdev –l
Estimate the primary and secondary dump device sizes
Estimate the copy directory free space
Report any problems in the error log file
Figure 12-13. The dumpcheck utility AN152.2
Notes:
Function of the dumpcheck utility

AIX 5L V5.1 introduced the /usr/lib/ras/dumpcheck utility. This utility is used to check
the disk resources used by the system dump facility. The command logs an error if
either the largest dump device is too small to receive the dump, or there is insufficient
space in the copy directory when the dump device is a paging space.
If the dump device is a paging space, dumpcheck will verify if the free space in the copy
directory is large enough to copy the dump.
If the dump device is a logical volume, dumpcheck will verify it is large enough to contain
a dump.
If the dump device is a tape, dumpcheck will exit without a message.
Any time a problem is found, dumpcheck will (by default) log an entry in the error log. If
the -p flag is present, dumpcheck will display a message to stdout.
Instructor Guide
Example of dumpcheck use

The following example illustrates use of the dumpcheck utility and shows sample output
from this command:
# /usr/lib/ras/dumpcheck -p
There is not enough free space in the file system containing the copy
directory to accommodate the dump.
File system name

/var/adm/ras
Current free space in kb
117824
Current estimated dump size in kb
161996
Note that, since the -p flag was used in this example, the output from dumpcheck was
written to stdout.
Enabling and disabling dumpcheck

In order to be effective, the dumpcheck utility must be enabled. Verify that dumpcheck
has been enabled by using the following command:
# crontab -l | grep dumpcheck
0 15 * * * /usr/lib/ras/dumpcheck >/dev/null 2>&1
By default, it is set to run at 3 p.m. each afternoon.
Enable the dumpcheck utility by using the -t flag. This will create an entry in the root
crontab if none exists. In this example, the dumpcheck utility is set to run at 2 p.m:
# /usr/lib/ras/dumpcheck -t “0 14 * * *”
For the best results, set dumpcheck to run when the system is heavily loaded. This will
identify the maximum size the dump will take. As previously mentioned, the time is set
for 3 p.m. by default.
If you use the -p flag in the crontab entry, root will be send a mail message with the
standard output of the dumpcheck command.

V7.0.1
Instructor Guide

Purpose — Discuss the dumpcheck command.
Details — Emphasize that (by default) any problems found by dumpcheck will be written to
the error log. So, it is important to check the error log.
Additional information — If compression is turned off, dumpcheck does not work and
gives the error message: 0453-062 Could not change the user Set attributes.
Transition statement — We have mentioned several ways in which a system dump can
be initiated. Let’s discuss this subject in more detail.
Instructor Guide
Methods of starting a dump

IBM Power Systems
Automatic invocation of dump routines by system
Using the sysdumpstart command or SMIT

Option: -p (send to primary dump device)
Option: –s (send to secondary dump device)
Option: –t (use traditional dump)
Option: –f (select scope of dump)
Using a special key sequence on the LFT

<Ctrl-Alt-NUMPAD1> (to primary dump device)
<Ctrl-Alt-NUMPAD2> (to secondary dump device)
Using the Reset button
Using the Hardware Management Console (HMC)

Restart LPAR with the Dump option
Using the remote reboot facility

Figure 12-14. Methods of starting a dump AN152.2
Notes:
Ways to obtain a system dump

A system dump may be automatically created by the system. In addition, there are
several ways for a user to invoke a system dump. The most appropriate method to use
depends on the condition of the system.
Automatic invocation of dump routines

If there is a kernel panic, the system will automatically dump the contents of real
memory to the primary dump device.
Using the sysdumpstart command or SMIT

One method a superuser can use to invoke a dump is to run the sysdumpstart
command or invoke it through SMIT (fastpath smit dump).

V7.0.1
Instructor Guide
Uempty The -p flag of sysdumpstart is used to specify a dump to the primary dump device.
The -s flag of sysdumpstart is used to specify a dump to the secondary dump device.
The -t flag of sysdumpstart is used to change the default type from fw_assist to
traditional.
The -f flag of sysdumpstart is used to change the scope of the dump (interacts with
the configuration set up with sysdumpdev):
- disallow - Do not allow a full memory dump
- require - Require a full memory dump
Using a special key sequence

If the system has halted, but the keyboard will still accept input, a dump to the primary
dump device can be forced by pressing the <Ctrl-Alt-NUMPAD1> key sequence on the
LFT keyboard. The key combination <Ctrl-Alt-NUMPAD2> on the LFT can be used to
initiate a system dump to the secondary dump device. This method can only be used
when your machine's mode switch (if your machine has such a switch) is set to the
Service position or the Always Allow System Dump option is set to true. The Always
Allow System Dump option can be set to true using SMIT or by using sysdumpdev -K.
Using the Reset button

On systems running versions of AIX 5L prior to AIX 5L V5.3, a dump can also be
invoked when the Reset button is pressed and when your machine's mode switch is set
to the Service position or the Always Allow System Dump option is set to true. In AIX
5L V5.3 and later, the system will always dump when the Reset button is pressed,
providing the dump device is non-removable. This method can be used if the keyboard
is no longer accepting input. Note that pressing the Reset button twice will cause the
system to reboot.
Using the Hardware Management Console

In an LPAR environment, a dump can be initiated from the Hardware Management
Console (HMC) by choosing Dump from the Restart Options (accessed through the
Restart Partition menu selection in the Server Management application). The Dump
option is the equivalent of pressing the physical Reset button on a non-LPAR system.
The partition will initiate a system dump to the primary dump device if configured to do
that. Otherwise, the partition will simply reboot.
Using the remote reboot facility

The remote reboot facility can also be used to obtain a system dump. This capability will
be further discussed shortly.
Instructor Guide
Obtaining a useful system dump

Bear in mind that if your system is still operational, a dump taken at this time will not
assist in problem determination. A relevant dump is one taken at the time of the system
halt.

V7.0.1
Instructor Guide

Purpose — Describe the different methods that can be used to initiate a dump.
Details — Do not start a dump if the flashing 888 number shows on the LED. This number
could indicate that a dump has already occurred on your system. You can determine this by
finding out the LED code that is displayed after the flashing 888. If it is a 102, then this
indicates that a dump has occurred. This indicates that your system has already created a
system dump and has written the information to the primary dump device. If you start your
own dump before copying the information in your dump device, your new dump may
overwrite the existing information. You are allowed two system dumps with the names
vmcore.0 and vmcore.1. If another system dump occurs, the names will be vmcore.1 and
vmcore.2, with system dump vmcore.0 removed.
A user-initiated dump is different from a dump initiated by an unexpected system halt
because the user can designate which dump device to use. When the system halts
unexpectedly, a system dump is initiated automatically to the primary dump device.
Here is some additional information about some of the methods listed on the visual:
• Command Line
This method uses the sysdumpstart command. Note, however, this command is only
available if you install the Software Service Aids (bos.sysmgt.serv_aid) package.
You must have root authority to run this command. First, you might want to check the
current settings of your system dump devices by using the sysdumpdev -l command.
Then initiate the dump with sysdumpstart -p (for the primary device) or -s (for the
secondary device). Note that if the LED display is blank on the RS/6000 with an LED,
the dump was not started. Try again using a different method. There is no way to tell on
the RS/6000 system without an LED if a dump has started, is in process, or has
finished.
• Using SMIT
The SMIT screen which will allow you to do this is shown on a subsequent visual.
• Using special key sequence
If you have an LFT, you can initiate a dump either to the primary or the secondary
device by using one of the key sequences specified. The NUMPAD, which is referred to
in the student notes, is the set of number keys on the right hand side of the keyboard.
• Using the Reset button
This procedure works for all system configurations and will work in circumstances
where other methods for starting a dump will not. On systems running versions of AIX
prior to AIX 5L V5.3, ensure always allow dump is set to TRUE. The system writes the
dump information to the primary dump device. To set always allow dump to TRUE,
execute the sysdumpdev -K command (or use SMIT).
Transition statement — Let’s discuss the remote reboot facility in a little more detail.
Instructor Guide
Starting a dump from a TTY

IBM Power Systems
S1
ump
D
login: #dump#>1
Add a TTY
...
REMOTE Reboot ENABLE: dump
REMOTE Reboot STRING: #dump#
...
Figure 12-15. Starting a dump from a TTY AN152.2
Notes:
The remote reboot facility

The remote reboot facility allows the system to be rebooted through a native
(integrated) serial port. The system is rebooted when the reboot_string is received at
the port. This facility is useful when the system does not otherwise respond but is
capable of servicing serial port interrupts. Remote reboot can be enabled on only one
native serial port at a time.
An important feature of the remote reboot facility is that it can be configured to obtain a
system dump prior to rebooting.

V7.0.1
Instructor Guide
Uempty Configuring the remote reboot facility

Two native serial port attributes control the operation of remote reboot:
- reboot_enable
- reboot_string
Use of these attributes is discussed in the following paragraphs.
reboot_enable
The value of this attribute (referred to as REMOTE Reboot ENABLE in SMIT) indicates
whether this port is enabled to reboot the machine on receipt of the remote
reboot_string, and if so, whether to take a system dump prior to rebooting:
- no - Indicates remote reboot is disabled
- reboot - Indicates remote reboot is enabled
- dump - Indicates remote reboot is enabled, and, prior to rebooting, a system dump
will be taken on the primary dump device
reboot_string
This attribute (referred to as REMOTE Reboot STRING in SMIT) specifies the remote
reboot_string that the serial port will scan for when the remote reboot feature is
enabled. When the remote reboot feature is enabled, and the reboot_string is
received on the port, a '>' character is transmitted, and the system is ready to reboot. If
a '1' character is received, the system is rebooted (and a system dump may be started,
depending on the value of the reboot_enable attribute); any character other than '1'
aborts the reboot process. The reboot_string has a maximum length of 16 characters
and must not contain a space, colon, equal sign, null, new line, or Ctrl-\ character.
Enabling remote reboot

Remote reboot can be enabled through SMIT or the command line. For SMIT, the path
System Environments -> Manage Remote Reboot Facility may be used for a
configured TTY. Alternatively, when configuring a new TTY, remote reboot may be
enabled from the Add a TTY or Change/Show Characteristics of a TTY menus.
These menus are accessed through the path Devices -> TTY.
From the command line, the mkdev or chdev command is used to enable remote reboot.
Instructor Guide
Instructor notes:
Purpose — Explain how to start a dump from a TTY.
Details — Base your explanation on the material in the student notes.
Additional information — As mentioned in the student notes, the values for REMOTE
Reboot ENABLE are:
no Remote reboot is disabled
reboot Remote reboot is enabled
dump Remote reboot is enabled and a dump will occur prior to reboot
There is a good discussion of the remote boot facility (starting on page 24) in the AIX 5L
Version 5.3 System Management Guide: Operating System and Devices.
Transition statement — Let’s look at the dump interface of SMIT.

V7.0.1
Instructor Guide
Uempty
Generating dumps with SMIT

IBM Power Systems
# smit dump
System Dump
Move cursor to desired item and press Enter
Show Current Dump Devices
Show Information About the Previous System Dump
Show Estimated Dump Size
Change the Type of Dump
Change the Full Memory Dump Mode
Change the Primary Dump Device
Change the Secondary Dump Device
Change the Directory to which Dump is Copied on Boot
Start a Dump to the Primary Dump Device
Start a Traditional System Dump to the Secondary Dump Device
Copy a System Dump from a Dump Device to a File
Always ALLOW System Dump
Check Dump Resources Utility
Change/Show Global System Dump Properties
Change/Show Dump Attributes for a Component
Change Dump Attributes for multiple Components
Figure 12-16. Generating dumps with SMIT AN152.2
Notes:
Using the SMIT dump interface

You can use the SMIT dump interface to work with the dump facility.
The Always ALLOW System Dump option

A very important item on the menu shown on the visual is Always ALLOW System Dump.
If you set this option to yes, the CTRL-ALT-1 (numpad) and CTRL-ALT-2 (numpad)
key sequences will start a dump even when the key mode switch is in Normal position.
On systems running versions of AIX prior to AIX 5L V5.3, setting this item to yes also
enables use of the Reset button to start a system dump.
Instructor Guide
The SMIT dump menu

You can use the SMIT dump interface to work with the dump facility.
# smit dump
Move cursor to desired item and press Enter

Show Current Dump Devices
Show Information About the Previous System Dump
Show Estimated Dump Size
Change the Type of Dump
Change the Full Memory Dump Mode
Change the Primary Dump Device
Change the Secondary Dump Device
Change the Directory to which Dump is Copied on Boot
Start a Dump to the Primary Dump Device
Start a Traditional System Dump to the Secondary Dump Device
Copy a System Dump from a Dump Device to a File
Always ALLOW System Dump
Check Dump Resources Utility
Change/Show Global System Dump Properties
Change/Show Dump Attributes for a Component
Change Dump Attributes for multiple Components
The menu items that show or change the dump information use the sysdumpdev
command.

V7.0.1
Instructor Guide

Purpose — Introduce the SMIT dump interface.
Details — Do not go into too much detail here. Just mention that SMIT uses the
sysdumpdev command for many of the items and they were covered earlier. Explain that
PCI machines should always allow a system dump. Historically, MCA machines could put
the physical key in the service mode to achieve this. This setting was created specifically
for PCI machines.
While there are three new items (added in AIX 6.1) at the bottom of the menu, they are for
the component live dump facility. If asked about them, be ready to place them in context,
but avoid getting into the details which is outside the scope of this course.
On the other hand, the Change Type of Dump and Change Full Memory Dump Mode (new
items starting with AIX 6.1) relate to the firmware assisted dump capabilities we previously
introduced.
The name of the menu item Start a Dump to the Secondary Dump Device has changed
in AIX 6.1 to Start a Traditional System Dump to the Secondary Dump Device in
order to distinguish this from the firmware assisted dump.
Transition statement — Let’s discuss dump-related LED codes.
Instructor Guide
Generating dumps with HMC

IBM Power Systems
Figure 12-17. Generating dumps with HMC AN152.2
Notes:
If using an HMC to manage the LPAR, you may use the HMC GUI interface (or the
chsysstate command) to trigger a dump of the operating system.
In the GUI interface you would select the LPAR and then from the tasks menu:
Operations > Restart. The resulting window is shown in the visual. Clicking the Dump
button will select an operation to signal the system to effectively signal a reset to initiate
a dump.

V7.0.1
Instructor Guide

Purpose — Show how to initiate an AIX dump form the HMC.
Details —
Transition statement — Regardless of how a dump is triggered, let’s see what progress
code we might see.
Instructor Guide
Dump-related AIX progress codes

IBM Power Systems
0c0 Dump completed successfully

0c1 I/O error occurred during the dump
0c2 Dump started by user
0c4 Dump completed unsuccessfully, not enough space on
dump device, partial dump available
0c5 Dump failed to start, unexpected error occurred when
attempting to write to dump device; for example, tape not
loaded
0c6 Secondary dump started by user
0c8 Dump disabled, no dump device configured
0c9 System-initiated panic dump started
0cc Failure writing to primary dump device, switched over to
secondary
Figure 12-18. Dump-related AIX progress codes AN152.2
Notes:
System-initiated dumps
If a system dump is initiated through a kernel panic, the progress code 0c9 will be
displayed while the dump is in progress, and then either a flashing 888 or a steady 0c0.
For LPARs supported by an HMC, this will display in the reference code column. For a
server with no HMC and only one OS, it will display in the LED on the systems operator
panel.
All of the LED codes following an 888 (remember: you must use the Reset button),
should be recorded and passed to IBM.
User-initiated dumps
For user-initiated system dumps to the primary dump device, the progress codes should
indicate 0c2 for a short period, followed by 0c0 upon completion.

V7.0.1
Instructor Guide
Uempty Most common dump progress codes

Other common codes include the following:
0c0 Dump completed successfully
0c2 Dump started by user
0c9 System-initiated panic dump started
Additional dump progress code information:

0c1 An I/O error occurred during the dump.
0c4 Indicates that the dump routine ran out of space on the specified
device. It may still be possible to examine and use the data on the
dump device, but this tells you that you should increase the size of
your dump device.
0c5 Check the availability of the medium to which you are writing the
dump (for example, whether the tape is in the drive and write
enabled).
0c6 This is used to indicate a dump request to the secondary device.
0c7 A network dump is in progress, and the host is waiting for the server
to respond. The value in the three-digit display should alternate
between 0c7 and 0c2 or 0c9. If the value does not change, then the
dump did not complete due to an unexpected error.
0c8 You have not defined a primary or secondary dump device. The
system dump option is not available. Enter the sysdumpdev
command to configure the dump device.
0c9 A dump started by the system did not complete. Wait for one minute
for the dump to complete and for the three-digit display value to
change. If the three-digit display value changes, find the new value
on the list. If the value does not change, then the dump did not
complete due to an unexpected error.
0cc This code indicates that the dump could not be written to the primary
dump device. Therefore, the secondary dump device will be used.
This code was introduced quite some time ago (with AIX V4.2.1).
Instructor Guide
Instructor notes:
Purpose — List the different dump progress codes that will be seen under different
circumstances.
Details — Go through the list and highlight first the codes that will be seen if a
system-initiated dump occurs and then if a user-initiated dump occurs. Mention that dumps
can require a significant amount of time; on the course development system with 1.5 GB of
memory, it took 25 minutes to generate the dump.
Refer to the student notes for a detailed description of the commonly seen codes.
Additional information — While the dump is occurring, the 0c2 or 0c9 code is displayed.
How long the dump takes to complete is dependent on how large the dump is. Small
dumps should take less than 30 seconds; large dumps may take several minutes.
On machines with two line front panel displays (LEDs), the second line will display the
number of bytes written so far to the dump device. This provides an indication to you that
the dump is still proceeding well, and it also gives you an idea of how much more data has
to be written (if you have a record of a past sysdumpdev -e).
Transition statement — Having caused a dump, the next issue you have to consider is
how you are going to retrieve the dump from your system.

V7.0.1
Instructor Guide
Uempty
Copying a system dump

IBM Power Systems
Is hd6 being
used as the
yes rc.boot 2
dump logical
volume?
no Is there
yes sufficient space
in /var to copy
Use savecore dump to?
no
Dump copied
Display the copy forced copy flag
dump to tape =
menu TRUE
/var/adm/ras
copy directory Boot continues
Figure 12-19. Copying a system dump AN152.2
Notes:
Copying a dump to /var

After a crash, if the progress code displays 0c0, then you know that a dump occurred
and that it completed successfully. At this point, unless you have set the autorestart
system attribute to true, you have to reboot your system.
When using paging space (hd6) as the dump logical volume, the rc.boot script will
attempt to copy the dump to a directory specified by copy directory (sysdumpdev).
When using a dedicated dump logical volume. The dump remains in the dump logical
volume and is not automatically copied anywhere at reboot. The savecore command
can then be used to copy the dump at a later convenient time.
Sufficient space in /var

If there is enough space to copy the dump from the paging space to the /var/adm/ras
directory, then it will be copied directly.
Instructor Guide
Insufficient space in /var/adm/ras

If, however, at bootup, the system determines that there is not enough space to copy
the dump to /var, the /sbin/rc.boot script (which is executed at bootup) will call the
/lib/boot/srvboot script. This script in turn calls on the copydumpmenu command,
which is responsible for displaying the following menu which can be used to copy the
dump to removable media:
Copy a System Dump to Removable Media
The system dump is 583973 bytes and will be copied from /dev/hd6
to media inserted into the device from the list below.
Please make sure that you have sufficient blank, formatted media
before proceeding.
Step One: Insert blank media into the chosen drive.

Step Two: Type the number for that device and press Enter.
Device type Path Name
>>> 1 tape/scsi/8mm /dev/rmt0

2 Diskette Drive /dev/fd0
88 Help?
99 Exit
>>> Choice [1]
The copy dump menu will only be displayed if the sysdumpdev attribute of forced copy
flag has a value of TRUE.

V7.0.1
Instructor Guide

Purpose — Explain how a dump gets copied to a directory in a file system.
Details — Cover the procedure as described.
Relate the control of the copy to a copy directory (when there is not enough space) back to
the sysdumpdev attributes covered earlier in the unit. Specifically relate to the: -d and -D
flags which are used to specify the copy directory.
When using paging space (hd6) as the dump logical volume, the dump needs to be copied
out of the paging space before that paging space can be used for its primary purpose.
When using a dedicated dump logical volume, there is not conflict over the use of that
logical volume; thus, there is no automatic attempt to copy it into any file system or external
device. In the later case, it is the job of the administrator to use the savecore command to
copy the dump from the dump logical volume to a file in a file system.
Transition statement — Let us look at what control a reboot following a crash.
Instructor Guide
Automatically reboot after a crash

IBM Power Systems
# smit chgsys
Change/Show Characteristics of Operating System

Maximum number of PROCESSES allowed per user [128]

Maximum number of pages in block I/O BUFFER CACHE [20]
Automatically REBOOT system after a crash false
...
Enable full CORE dump false

Use pre-430 style CORE dump false
F1=Help F2=Refresh F3=Cancel F4=List

F5=Reset F6=Command F7=Edit F8=Image
F9=Shell F10=Exit Enter=Do
Figure 12-20. Automatically reboot after a crash AN152.2
Notes:
Specifying automatic reboot using SMIT

If you want your system to reboot automatically after a dump, you must set the kernel
parameter autorestart to true. This can be easily done by the SMIT fastpath smit
chgsys. The corresponding menu item is Automatically REBOOT system after a
crash. Note that the default value is true in AIX 5L V5.2 and later.
Specifying automatic reboot using the chdev command

If you do not want to use SMIT to specify automatic reboot after a system dump,
execute the following command:
# chdev -l sys0 -a autorestart=true

V7.0.1
Instructor Guide
Uempty Checking the size of /var

If you specify an automatic reboot, you should verify that the /var file system is large
enough to store a system dump.
Instructor Guide
Instructor notes:
Purpose — Describe how to set up an automatic reboot after a crash.
Details — Base your explanation on the material in the student notes.
Transition statement — Let’s discuss the snap command.

V7.0.1
Instructor Guide
Uempty
Sending a dump to IBM

IBM Power Systems
Copy all system configuration data including a dump onto

tape:
# snap -a -o /dev/rmt0
Label tape with:

Problem Management Record (PMR) number
Command used to create tape
Block size of tape
Support Center uses kdb to examine the dump
Figure 12-21. Sending a dump to IBM AN152.2
Notes:
Collecting system data

Before sending a dump to the IBM Support Center, use the snap command to collect
system data. The command /usr/sbin/snap -a -o /dev/rmt0 will collect all the
necessary data.
In AIX 5L V5.2 and subsequent versions, pax is used to write the data to tape.
The Support Center will need the information collected by snap in addition to the dump
and kernel. Do not send just the dump file vmcore.x without the corresponding AIX
kernel. Without the corresponding kernel, analysis is not possible.
When sending a tape, label the tape with your assigned Problem Management Record
(PMR) number, the command used to create the tape, and the tape block size.
Instructor Guide
Use of the kdb command

The AIX Systems Support Center will analyze the contents of the dump using the kdb
command. The kdb command uses the kernel that was active on the system at the time
of the halt.
Purpose of snap command

The snap command was developed by IBM to simplify gathering configuration
information. It provides a convenient method of sending lslpp and errpt output to the
support centers. It gathers system configuration information and compresses the
information to a pax file. The file can then be downloaded to disk, or tape.
Flags for snap command

Some useful flags for the snap command are the following:
-a Copies all system configuration information to /tmp/ibmsupt
directory tree
-c Creates a compressed pax image (snap.pax.Z) of all files in the
/tmp/ibmsupt directory tree or other named output directory
-f Gathers file system information
-g Gathers general information
-k Gathers kernel information
-D Gathers dump and /unix
-t Creates tcpip.snap file; gathers TCP/IP information
AIX 5L V5.3 snap enhancements

AIX 5L V5.3 extended the functionality of snap in using external scripts, letting snap
split up the output pax file into smaller pieces, or extending the collected data. The next
few paragraphs provide additional details regarding these new capabilities.
Extending snap to run external scripts

Scripts that the snap command is to run can be specified in three different ways:
- Specifying the name of a script in the /usr/lib/ras/snapscripts directory that snap
should call
- Specifying the all keyword, which indicates that snap should call all scripts in the
/usr/lib/ras/snapscripts directory

V7.0.1
Instructor Guide
Uempty - Specifying the name of a file that contains the list of scripts (one per line) that snap
should call. The syntax file:<name of file containing list of scripts> is used in this
case.
The snapsplit command

The snapsplit command is introduced in AIX 5L V5.3. The snapsplit command is
used to split a snap output file into smaller files. This command is useful for dealing with
very large snap files. It breaks the file down into files of a specific size that are multiples
of 1 MB. Furthermore, it will combine these files into the original file when called with the
-u option. Refer to the man page for snapsplit (or the corresponding entry in the AIX
Commands Reference manual) for additional information regarding this command.
Splitting the snap output file from the snap command

There is a new flag for the snap command, -O megabytes, introduced in AIX 5L V5.3
that enables you to split the snap output file. The snap command calls the snapsplit
command. You can use the flag as follows to split the large snap output into smaller
4 MB files.
# snap -a -c -O 4
Instructor Guide
Instructor notes:
Purpose — Explain how the system dump should be prepared before it is sent to the IBM
Support Center.
Details — Provide the students with as much of the following information as you think is
appropriate:
The information gathered with the snap command can be used to identify and resolve
system problems. You must have root authority to execute this command.
If you use the -a flag, then you need approximately 8 MB of temporary disk space to collect
all the system information, including the contents of the error log (covered in a previous
unit).
The -g flag gathers the following information:
• Error report
• Copy of the customized ODM
• Trace file
• User environment
• Amount of physical memory and paging space
• Device and attribute information
• Security user information
The output from the -g flag is written to tmp/ibmsupt/general/general.snapfile. However,
you can specify another directory using the -d flag.
The execution of snap appends information to the previously created files. Use the -r flag
to remove previously gathered and saved information.
Before you send your media to the support center, ensure you call them and obtain a
Problem Management Number (PMR) which will be used to trace the status of your
problem. Ensure you label the media with this number, and also the other pieces of
information listed, to help the support team act quickly on your problem.
There is not much left for you to do after this, apart from waiting for a response from the
Support Center. However, you may want to have a look at your dump to try and analyze it
yourself. The tool that is used by the support center to analyze your dump is called kdb
(crash prior to AIX 5L V5.1), which is also available on the system; however, the output
from the command is very user unfriendly. Most people do not bother with this.
See the student notes for the AIX 5L V5.3 enhancements.
Additional information — In AIX 5L, the pax command was enhanced to allow archiving
of large files, such as dumps. The tar command, which was used prior to AIX 5L, does not
support files larger than 2 GB. If the file to be archived is larger than 2 GB, the only thing
available is pax.
Transition statement — Let's take a brief look at kdb to see how it can be used.

V7.0.1
Instructor Guide
Uempty
Using kdb to analyze a dump

IBM Power Systems
/unix
/var/adm/ras/vmcore.x
(Kernel)
(Dump file)
# uncompress /var/adm/ras/vmcore.x.Z
OR
# dmpuncompress /var/adm/ras/vmcore.x.BZ
# kdb /var/adm/ras/vmcore.x /unix
> status
> stat
(further sub-commands for analyzing)
> quit
/unix kernel must be the same as on the failing machine

Figure 12-22. Using kdb to analyze a dump AN152.2
Notes:
Function of the kdb command

The kdb command is an interactive tool used for operating system analysis. Typically,
kdb is used to examine kernel dumps in a system postmortem state. However, a live
running system can also be examined with kdb, although due to the dynamic nature of
the operating system, the various tables and structures often change while they are
being examined, and this precludes extensive analysis.
Examining an active system

To examine an active system, you would simply run the kdb command without any
arguments.
Instructor Guide
Analyzing a system dump

For a dead system, a dump is analyzed using the kdb command with file name
arguments, as illustrated on the visual.
To use kdb, the vmcore file must be uncompressed. After a crash, it is typically named
vmcore.x.Z, which indicates that it is in a compressed format. As illustrated on the
visual, use the uncompress command before using kdb.
To analyze a dump file, you would first uncompress the compressed dump. If the dump
file has a .Z suffix, then you would use the uncompress command. Starting in AIX 6.1,
the dump file ends in a .BZ suffix and you must use the dmpuncompress command to
process this file. If you wish to leave the original compressed file intact (rather than
replacing it with the uncompressed file), then use the -p option of the dmpuncompress
command.
# uncompress /var/adm/ras/vmcore.x.Z
or
# dmpuncompress /var/adm/ras/vmcore.x.BZ
Once the dump is uncompressed, you would analyze it with the kdb command.
# kdb /var/adm/ras/vmcore.x /unix
Potential problems when using kdb

If the copy of /unix does not match the dump file, the following output will appear on the
screen:
WARNING: dumpfile does not appear to match namelist
If the dump itself is corrupted in some way, then the following will appear on the screen:
...
dump /var/adm/ras/vmcore.x corrupted
Useful subcommands
Examining a system dump requires an in-depth knowledge of the AIX kernel. However,
there are two subcommands that might be useful to you:
- The subcommand status displays the processes/threads that were active on the
CPUs when the crash occurred
- The subcommand stat shows the machine status when the dump occurred
To exit the kdb debug program, type quit at the > prompt.

V7.0.1
Instructor Guide
Uempty Creating a sample system dump

The following example stops your running machine and creates a system dump:
# cat /unix > /dev/mem
Do not execute this command in your production environment.
The LEDs displayed are 888, 102, 300, 0C0:
- Refer to earlier material for discussion of the 888 code
- LED 102 indicates that “a dump has occurred”
- LED 300 stands for crash code “Data Storage Interrupt (DSI)”
- LED 0C0 means “Dump completed successfully”
Instructor Guide
Instructor notes:
Purpose — Introduce the kdb command.
Details — Cover the information in the student notes.
You might also want to make some of the points mentioned below:
kdb is an interactive utility for examining an operating system image, a core image, or the
running kernel. It also interprets and formats control structures in the system and provides
certain miscellaneous functions useful for examining a dump.
In order to analyze the dump, you must execute the kdb command against /unix, and it
must be the /unix of the system that had the problem. To make any change to code, you
must have the source AIX code, which is not held by customers - so there is not much more
that you can do. Generally speaking, it is best left for the IBM Support Center to handle the
dump.
The last thing you want to do is send a dump to the IBM Support Center and find out that
they cannot do anything about it because it is a partial dump. Get it right from the start.
Additional information — Prior to AIX 5L V5.1, the crash command was used instead of
the kdb command.
Transition statement — We have reached a checkpoint.

V7.0.1
Instructor Guide
Uempty
Checkpoint
IBM Power Systems
1. If your system has less than 4 GB of main memory, what is the

default primary dump device? Where do you find the dump file after
reboot?
2. What command can be used to initiate a system dump?
3. If the copy directory is too small, will the dump, which is copied
during the reboot of the system, be lost?
4. Which command should you execute to collect system data before

sending a dump to IBM?
Notes:
Instructor Guide
Instructor notes:
Purpose — Present the checkpoint questions.
IBM Power Systems
1. If your system has less than 4 GB of main memory, what is the default
primary dump device? Where do you find the dump file after reboot?
The answer is the default primary dump device is /dev/hd6. The default
dump file is /var/adm/ras/vmcore.x, where x indicates the number of the
dump.

The answer is sysdumpstart.
3. If the copy directory is too small, will the dump, which is copied during the
reboot of the system, be lost?
The answer is if the force copy flag is set to TRUE, a special menu is
shown during reboot. From this menu, you can copy the system dump to
portable media.
4. Which command should you execute to collect system data before sending
a dump to IBM?
The answer is snap.
Additional information — Here are a couple of points you might want to make when
going over the answers to the checkpoint:
• If there is 4 GB or more of memory, then a dedicated dump logical volume is created.
• Dump compression can be turned off with the -c flag of sysdumpdev.
Transition statement — Let’s switch over to the lab.

V7.0.1
Instructor Guide
Uempty
Exercise: System dump

IBM Power Systems
Work with the AIX Dump Facility

Work with a dedicated dump logical
volume
Generate a firmware assisted dump
Initiate a dump from the HMC
Figure 12-24. Exercise: System dump AN152.2
Notes:
Instructor Guide
Instructor notes:
Purpose — Transition to the exercise for this unit.
Details —
Transition statement — Let’s recall some of the key points from this unit.

V7.0.1
Instructor Guide
Uempty
Unit summary
IBM Power Systems

Explain what is meant by a system dump
Determine and change the primary and secondary dump
devices
Create a system dump
Execute the snap command
Use the kdb command to check a system dump
Notes:
Discussion
When a dump occurs, kernel and system data are copied to the primary dump device.
By default, the system has a primary dump device (/dev/hd6) and a secondary device
(/dev/sysdumpnull).
During reboot, the dump is copied to the copy directory (/var/adm/ras).
A system dump should be retrieved from the system using the snap command.
The Support Center uses the kdb debugger to examine the dump.
Instructor Guide
Instructor notes:
Additional information — You might want to note that, if the system has 4 GB or more of
main memory, then a dedicated dump logical volume is created. So, the default primary
dump device actually depends on the amount of physical memory installed in the system.
Transition statement — This brings us to the end of this course. Thank you.

V7.0.1
Instructor Guide
AP Appendix A. Checkpoint solutions
Unit 1, "Advanced AIX administration overview"
Solutions for Figure 1-15, "Checkpoint," on page 1-41
IBM Power Systems

The answer is identify the problem, talk to users (to further define the
problem), collect system data, and resolve the problem.

The answer is always talk to the users about such problems in order to
gather as much information as possible.

The answer is false: in most cases, it is only necessary to apply fixes or
upgrade microcode.
4. True or false: Documentation can be viewed or downloaded from the

IBM Web site.
The answer is true.
© Copyright IBM Corp. 2009, 2012 Appendix A. Checkpoint solutions A-1

Instructor Guide
Unit 2, "The Object Data Manager"
IBM Power Systems
1. In which ODM class do you find the physical volume IDs of

your disks?
The answer is CuAt.
2. What is the difference between the states: defined and

available?
The answer is when a device is defined, there is an entry in
ODM class CuDv. When a device is available, the device
driver has been loaded. The device driver can be accessed
by the entries in the /dev directory.
A-2 AIX Advanced Administration © Copyright IBM Corp. 2009, 2012

V7.0.1
Instructor Guide
AP Unit 3, "Error monitoring"
Solutions for Figure 3-20, "Checkpoint (1 of 2)," on page 3-61
IBM Power Systems

The answer is the errpt command.

report?
The answer is errpt–a generates a detailed report.

The answer is DISK_ERR4.

The answer is it clears entries from the error log.

Instructor Guide
IBM Power Systems

The answer is it is used by root to add entries into the error log.

*.debug errlog
The answer is all syslogd entries are directed to the error log.

The answer is it specifies a program or command to be run when
an error matching the selection criteria is logged.

V7.0.1
Instructor Guide
AP Unit 4, "Network Installation Manager basics"
IBM Power Systems

The answer is true, maint_boot.

Instructor Guide
Unit 5, "System initialization: Accessing a boot image"
IBM Power Systems
The answer is false: SMS is part of the built-in firmware.

The answer is you need to boot the SMS programs and set the new
boot list to include hdisk1.

The answer is # bootlist -m normal –o.
The answer is # bootlist -m normal device1 device2.

V7.0.1
Instructor Guide
AP Solutions for Figure 5-17, "Checkpoint (2 of 2)," on page 5-52
IBM Power Systems
The answer is bosboot -ad /dev/hdiskx.

The answer is rc.boot.
The answer is false: the AIX kernel is loaded from hd5.

The answer is you need to boot from an AIX CD, mksysb, or NIM
server.

Instructor Guide
Unit 6, "System initialization: rc.boot and inittab"
Solutions for Figure 6-10, "Let’s review: rc.boot (1 of 3)," on page 6-29

IBM Power Systems
(1)
/etc/init from RAMFS rc.boot 1

in the boot image
(2)
restbase
(4)
(3)
ODM files in RAM cfgmgr -f
file system
(5)
bootinfo -b

V7.0.1
Instructor Guide
AP Solutions for Figure 6-11, "Let’s review: rc.boot (2 of 3)," on page 6-31

IBM Power Systems
(5)
rc.boot 2 Merge RAM /dev files
(1) (6)
Activate rootvg Copy RAM ODM files
Mount /dev/hd4 (2) (7)

Copy boot messages
on / in RAMFS to alog
Mount /var (3)

Copy dump (8)
Unmount /var 557 mount /dev/hd4
(4)
Turn on
paging

Instructor Guide
Solutions for Figure 6-12, "Let’s review: rc.boot (3 of 3)," on page 6-33

IBM Power Systems
savebase
/etc/inittab
syncd 60
/sbin/rc.boot3 errdemon
fsck -f /dev/hd3 Turn off LEDs

mount /tmp
rm /etc/nologin
syncvg rootvg &
chgstatus=3
cfgmgr -p2 in CuDv ?
cfgmgr -p3

Start Console: cfgcon /etc/inittab
Start CDE: rc.dt boot

V7.0.1
Instructor Guide
AP Solutions for Figure 6-18, "Let's review: /etc/inittab file," on page 6-53
Let's review solution: /etc/inittab file

IBM Power Systems
init:2:initdefault: Determine initial run-level
brc::sysinit:/sbin/rc.boot 3 Startup last boot phase
rc:2:wait:/etc/rc Multiuser initialization
fbcheck:2:wait:/usr/sbin/fbcheck Execute /etc/firstboot, if it exists
srcmstr:2:respawn:/usr/sbin/srcmstr Start the System Resource Controller
cron:2:respawn:/usr/sbin/cron Start the cron daemon
rctcpip:2:wait:/etc/rc.tcpip Startup communication daemon processes

rcnfs:2:wait::/etc/rc.nfs (nfsd, biod, ypserv, and so forth)
qdaemon:2:wait:/usr/bin/startsrc -sqdaemon Startup spooling subsystem
dt:2:wait:/etc/rc.dt Startup CDE desktop
tty0:2:off:/usr/sbin/getty /dev/tty1 Line ignored by init
myid:2:once:/usr/local/bin/errlog.check Process started only one time

Instructor Guide
IBM Power Systems

The answer is from the /etc/inittab file in rootvg.

The answer is rc.boot 2.

The answer is corrupted JFS log or damaged file system.
4. Which ODM file is used by the cfgmgr during boot to

configure the devices in the correct sequence?
The answer is Config_Rules.

V7.0.1
Instructor Guide
IBM Power Systems
The answer is there is a problem with processing /etc/inittab.

mean?
The answer is this line is used by the init process, to determine the
initial run level (2=multiuser).

Instructor Guide
Unit 7, "LVM metadata and related problems"
IBM Power Systems
1.True or false: All LVM information is stored in the ODM.

The answer is false: Information is also stored in other AIX files and in
disk control blocks (like the VGDA and LVCB).
2.True or false: You detect that a physical volume hdisk1 that

is contained in your rootvg is missing in the ODM. This
problem can be fixed by exporting and importing rootvg.
The answer is false: Use the rvgrecover procedure instead. This
script creates a complete set of new rootvg ODM entries.

V7.0.1
Instructor Guide
AP Unit 8, "Disk management procedures"
IBM Power Systems
1. Although everything seems to be working fine, you detect error log

entries for disk hdisk0 in your rootvg. The disk is not mirrored to
another disk. You decide to replace this disk. Which procedure would
you use to migrate this disk?
The answer is procedure 2: Disk still working. There are some
additional steps necessary for hd5 and the primary dump device
hd6.You detect an unrecoverable disk failure in volume group datavg.
2. This volume group consists of two disks that are completely mirrored.
Because of the disk failure you are not able to vary on datavg. How do
you recover from this situation?
The answer is forced varyon: varyonvg -f datavg. Use procedure
1 for mirrored disks.
3. After disk replacement, you find that a disk has been removed from the
system but not from the volume group. How do you fix this?
The answer is repair the ODM, for example through exportvg and
importvg. Execute reducevg using the PVID instead of disk name.

Instructor Guide
Unit 9, "Install and cloning techniques"
IBM Power Systems

The answer is installing a mksysb image on another disk and
cloning the current running rootvg to an alternate disk.

The answer is creates an online backup and allows maintenance
and updates to software on the alternate disk helping to minimize
down time.
3. Why should you not use exportvg with an alternate disk volume
group?
The answer is this will remove rootvg related entries from
/etc/filesystems.

V7.0.1
Instructor Guide
IBM Power Systems
4. True or false: multibos provides for booting between alternate

operating system environments within a single rootvg.
The answer is true.
5. True or false: A standby BOS can only be accessed by changing

the bootlist and then rebooting.
6. True or false: multibos requires cloning all of the logical volumes

in the active rootvg.

Instructor Guide
Unit 10, "Advanced backup techniques"
IBM Power Systems
1. True or false: The creation of a snapshot volume group marks all copies in the
snapshot as stale.
2. True or false: The creation of a JFS split copy marks all of the split mirror copies
as stale.
The answer is true.
3. True or false: After the creation of a JFS split mirror copy, the administrator needs
to mount the new file system in order to access the split copy.
4. To access a SAN Copy of an active volume group on the source system, use the
command:
a. joinvg
b. importvg
c. recreatevg
The answer is recreatevg.

V7.0.1
Instructor Guide
AP Unit 11, "Diagnostics"
IBM Power Systems

a. Concurrent
b. Maintenance
d. All of the above
The answer is all of the above.

The answer is use either maintenance or service mode.

Instructor Guide
Unit 12, "The AIX system dump facility"
IBM Power Systems
1. If your system has less than 4 GB of main memory, what is the default
primary dump device? Where do you find the dump file after reboot?
The answer is the default primary dump device is /dev/hd6. The default
dump file is /var/adm/ras/vmcore.x, where x indicates the number of the
dump.

The answer is sysdumpstart.
3. If the copy directory is too small, will the dump, which is copied during the
reboot of the system, be lost?
The answer is if the force copy flag is set to TRUE, a special menu is
shown during reboot. From this menu, you can copy the system dump to
portable media.
4. Which command should you execute to collect system data before sending
a dump to IBM?
The answer is snap.

V7.0.1
Instructor Guide
AP Appendix B. Command summary
Startup, Logoff, and Shutdown

<Ctrl>d (exit) Log off the system (or the current shell).
shutdown Shuts down the system by disabling all processes. If in
single-user mode, you may want to use -F option for fast
shutdown. -r option will reboot system. This requires user to be
root or member of shutdown group.
Directories
mkdir Make directory
cd Change the directory. The default is $HOME directory.
rmdir Remove a directory (beware of files starting with “.”).
rm Remove file; -r option removes directory and all files and
subdirectories recursively.
pwd Print working directory: shows name of current directory
ls List files
-a (all)
-l (long)
-d (directory information)
-r (reverse alphabetic)
-t (time changed)
-C (multi-column format)
-R (recursively)
-F (places / after each directory name & * after each exec file)
Files - Basic
cat List files contents (concatenate). This can open a new file with
redirection, for example, cat > newfile. Use <Ctrl>d to end
input.
© Copyright IBM Corp. 2009, 2012 Appendix B. Command summary B-1

Instructor Guide
chmod Change the permission mode for files or directories.

• chmod =+- files or directories
• (r,w,x = permissions and u, g, o, a = who)
• Can use + or - to grant or revoke specific permissions
• Can also use numerics, 4 = read, 2 = write, 1 = execute
• Can sum them, first is user, next is group, last is other
• For example, chmod 746 file1 is user = rwx, group = r,
other = rw
chown Change owner of a file, for example, chown owner file
chgrp Change group of files
cp Copy file
mv Move or rename file
pg List file content by screen (page)
• h (help)
• q (quit)
• <cr> (next pg)
• f (skip 1 page)
• l (next line)
• d (next 1/2 page)
• $ (last page)
• p (previous file),
• n (next file)
• . (redisplay current page)
• /string (find string forward)
• ?string (find string backward)
• -# (move backward # pages)
• +# (move forward # pages)
. Current directory
.. Parent directory
rm Remove (delete) files (-r option removes directory and all files
and subdirectories)
head Print first several lines of a file
tail Print last several lines of a file
wc Report number of lines (-l), words (-w), characters (-c) in files,
no options gives lines, words, and characters
su Switch user
id Displays your user ID environment, user name and ID, group
names and IDs
B-2 AIX Advanced Administration © Copyright IBM Corp. 2009, 2012

V7.0.1
Instructor Guide
AP tty Displays the device that is currently active. Very useful for
XWindows where there are several pts devices that can be
created. It is nice to know which one you have active. who am i
will do the same.
Files - Advanced
awk Programmable text editor / report write
banner Display banner (can redirect to another terminal nn with
> /dev/ttynn)
cal Calendar (cal month year)
cut Cut out specific fields from each line of a file.
diff Differences between two files
find Find files anywhere on disks. Specify location by path (will
search all subdirectories under specified directory).
• -name fl (file names matching fl criteria)
• -user ul (files owned by user ul)
• -size +n (or -n) (files larger (or smaller) than n blocks)
• -mtime +x (-x) (files modified more (less) than x days ago)
• -perm num (files whose access permissions match num)
• -exec (execute a command with results of find command)
• -ok (execute a command interactively with results of find
command)
• -o (logical or)
• -print (display results. Usually included.)
find syntax: find path expression action
For example:
• find / -name "*.txt" -print
• find / -name "*.txt" -exec li -l {} \;
(Executes li -l where names found are substituted for {})
; indicates end of command to be executed and \ removes
usual interpretation as command continuation character)
grep Search for pattern, for example, grep pattern files.
pattern can include regular expressions.
• -c (count lines with matches, but do not list)
• -l (list files with matches, but do not list)
• -n (list line numbers with lines)
• -v (find files without pattern)

Instructor Guide
Expression metacharacters:
• [ ] matches any one character inside.
• with a - in [ ] will match a range of characters
• ^ matches BOL when ^ begins the pattern.
• $ matches EOL when $ ends the pattern.
• . matches any single character. (same as ? in shell)
• * matches 0 or more occurrences of the preceding
character. (Note: ".*" is the same as "*" in the shell).
sed Stream (text) editor, used with editing flat files
sort Sort and merge files
-r (reverse order); -u (keep only unique lines)
Editors
ed Line editor
vi Screen editor
INed LPP editor
emacs Screen editor +
Shells, Redirection, and Pipelining

< (read) Redirect standard input, for example, command < file reads
input for command from file.
> (write) Redirect standard output, for example, command > file writes
output for command to file overwriting contents of file.
>> (append) Redirect standard output, for example, command >> file
appends output for command to the end of file.
2> Redirect standard error (to append standard error to a file, use
command 2>> file) combined redirection examples:
• command < infile > outfile 2> errfile
• command >> appendfile 2>> errfile < infile
; Command terminator used to string commands on single line
| Pipe information from one command to the next command. For
example, ls | cpio -o > /dev/fd0 passes the results of the
ls command to the cpio command.
\ Continuation character to continue command on a new line, will
be prompted with > for command continuation

V7.0.1
Instructor Guide
AP tee Reads standard input and sends standard output to both

standard output and a file, for example,
ls | tee ls.save | sort results in ls output going to
ls.save and piped to sort command
Metacharacters
* Any number of characters (0 or more)
? Any single character
[abc] [ ] any character from the list
[a-c] [ ] match any character from the list range
! Not any of the following characters (for example, leftbox !abc
right box)
; Command terminator used to string commands on a single line
& Command preceding and to be run in background mode
# Comment character
\ Removes special meaning (no interpretation) of the following
character
Removes special meaning (no interpretation) of character in
quotes
" Interprets only $, backquote, and \ characters between the
quotes
' Used to set variable to results of a command.
for example, now='date' sets the value of now to current
results of the date command
$ Preceding variable name indicates the value of the variable
Physical and Logical Storage

chfs Changes file system attributes such as mount point,
permissions, and size
compress Reduces the size of the specified file using the adaptive LZ
algorithm
crfs Creates a file system within a previously created logical volume
extendlv Extends the size of a logical volume
extendvg Extends a volume group by adding a physical volume

Instructor Guide
fsck Checks for file system consistency, and allows interactive repair
of file systems
fuser Lists the process numbers of local processes that use the files
specified
lsattr Lists the attributes of the devices known to the system
lscfg Gives detailed information about the AIX system hardware
configuration
lsdev Lists the devices known to the system
lsfs Displays characteristics of the specified file system such as
mount points, permissions, and file system size
lslv Shows you information about a logical volume
lspv Shows you information about a physical volume in a volume
group
lsvg Shows you information about the volume groups in your system
lvmstat Controls LVM statistic gathering
migratepv Used to move physical partitions from one physical volume to
another
migratelp Used to move logical partitions to other physical disks
mkdev Configures a device
mkfs Makes a new file system on the specified device
mklv Creates a logical volume
mkvg Creates a volume group
mount Instructs the operating system to make the specified file system
available for use from the specified point
quotaon Starts the disk quota monitor
rmdev Removes a device
rmlv Removes logical volumes from a volume group
rmlvcopy Removes copies from a logical volume
umount Unmounts a file system from its mount point
uncompress Restores files compressed by the compress command to their
original size
unmount Exactly the same function as the umount command
varyoffvg Deactivates a volume group so that it cannot be accessed
varyonvg Activates a volume group so that it can be accessed

V7.0.1
Instructor Guide
AP Variables
= Set a variable (for example, d="day" sets the value of d to
"day"), can also set the variable to the results of a command by
the ` character, for example, now=`date` sets the value of
now to the current result of the date command.
HOME Home directory
PATH Path to be checked
SHELL Shell to be used
TERM Terminal being used
PS1 Primary prompt characters, usually $ or #
PS2 Secondary prompt characters, usually >
$? Return code of the last command executed
set Displays current local variable settings
export Exports variable so that they are inherited by child processes
env Displays inherited variables
echo Echo a message (for example, echo HI or echo $d),
can turn off carriage returns with \c at the end of the message,
can print a blank line with \n at the end of the message.
Tapes and Diskettes

dd Reads a file in, converts the data (if required), and copies the
file out
fdformat Formats diskettes or read/write optical media disks
flcopy Copies information to and from diskettes
format AIX command to format a diskette
backup Backs up individual files
• -i reads file names from standard input
• -v list files as backed up;
• For example, backup -iv -f/dev/rmt0 file1, file2
• -u backup file system at specified level; For example,
backup -level -u filesystem
Can pipe list of files to be backed up into command, for
example, find . -print | backup -ivf/dev/rmt0 where you
are in directory to be backed up.
mksysb Creates an installable image of the root volume group

Instructor Guide
restore Restores commands from backup

• -x restores files created with backup -i
• -v list files as restore
• -T list files stored of tape or diskette
• -r restores file system created with backup -level -u;
for example, restore -xv -f/dev/rmt0
cpio Copies to and from an I/O device, destroys all data previously
on tape or diskette, for input, must be able to place files in the
same relative (or absolute) path name as when copied out (can
determine path names with -it option), for input, if file exists,
compares last modification date and keeps most recent (can
override with -u option).
• -o (output)
• -i (input),
• -t (table of contents)
• -v (verbose),
• -d (create needed directory for relative path names)
• -u (unconditional to override last modification date)
for example, cpio -o > /dev/fd0 or
cpio -iv file1 < /dev/fd0
tapechk Performs simple consistency checking for streaming tape
drives
tcopy Copies information from one tape device to another
tctl Sends commands to a streaming tape device
tar Alternative utility to back up and restore files
pax Alternative utility to cpio and tar commands
Transmitting
mail Send and receive mail. With userID sends mail to userID.
Without userID, displays your mail. When processing your mail,
at the ? prompt for each mail item, you can:
• d - delete
• s - append
• q - quit
• enter - skip
• m - forward
mailx Upgrade of mail
uucp Copy file to other UNIX systems (UNIX to UNIX copy)

V7.0.1
Instructor Guide
AP uuto/uupick Send and retrieve files to public directory

uux Execute on remote system (UNIX to UNIX execute)
System administration
df Display file system usage
installp Install program
kill (pid) Kill batch process with ID or (PID) (find using ps);
kill -9 PID will absolutely kill process
mount Associate logical volume to a directory;
for example, mount device directory
ps -ef Shows process status (ps -ef)
umount Disassociate file system from directory
smit System management interface tool
Miscellaneous
banner Displays banner
date Displays current date and time
newgrp Change active groups
nice Assigns lower priority to following command (for example,
nice ps -f)
passwd Modifies current password
sleep n Sleep for n seconds
stty Show or set terminal settings
touch Create a zero length files
xinit Initiate X-Windows
wall Sends message to all logged in users
who List users currently logged in (who am i identifies this user)
man,info Displays manual pages

Instructor Guide
System files
/etc/group List of groups
/etc/motd Message of the day, displayed at login
/etc/passwd List of users and signon information. Password shown as !,
can prevent password checking by editing to remove !
/etc/profile System wide user profile executed at login, can override
variables by resetting in the user's .profile file
/etc/security Directory not accessible to normal users
/etc/security/environ User environment settings
/etc/security/group Group attributes
/etc/security/limits User limits
/etc/security/login.cfg Login settings
/etc/security/passwd User passwords
/etc/security/user User attributes, password restrictions
Shell programming summary
Variables
var=string Set variable to equal string. (NO SPACES). Spaces must be
enclosed by double quotes. Special characters in string must
be enclosed by single quotes to prevent substitution. Piping (|),
redirection (<, >, >>), and & symbols are not interpreted.
$var Gives value of var in a compound
echo Displays value of var, for example, echo $var
HOME = Home directory of user
MAIL = Mail file name
PS1 = Primary prompt characters, usually "$" or "#"
PS2 = Secondary prompt characters, usually ">"
PATH = Search path
TERM = Terminal type being used
export Exports variables to the environment
env Displays environment variables settings

V7.0.1
Instructor Guide
AP ${var:-string} Gives value of var in a command, if var is null, uses string

instead
$1 $2 $3... Positional parameters for variable passed into the shell script
$* Used for all arguments passed into shell script
$# Number of arguments passed into shell script
$0 Name of shell script
$$ Process ID (PID)
$? Last return code from a command
Commands
# Comment designator
&& Logical-and. Run command following && only if command
Preceding && succeeds (return code = 0)
|| Logical-or. Run command following || only if command
preceding || fails (return code < > 0)
exit n Used to pass return code nl from shell script, passed as
variable $? to parent shell
expr Arithmetic expressions
Syntax: "expr expression1 operator expression2"
operators: + - \* (multiply) / (divide) % (remainder)
for loop for n (or: for variable in $*); for example,:
do
command
done
if-then-else if test expression
then command
elif test expression
then command
else
then command
fi
read Read from standard input
shift Shifts arguments 1-9 one position to the left and decrements
number of arguments

Instructor Guide
test Used for conditional test, has two formats.

if test expression (for example, if test $# -eq 2)
if [ expression ]
(for example, if [ $# -eq 2 ]) (spaces required)
Integer operators:
-eq (=) -lt (<) -le (=<)
-ne (<>) -gt (>) -ge (=>)
String operators:
= != (not eq.) -z (zero length)
File status (for example, -opt file1)
• -f (ordinary file)
• -r (readable by this process)
• -w (writable by this process)
• -x (executable by this process)
• -s (non-zero length)
while loop while test expression
do
command
done
Miscellaneous
sh Execute shell script in the sh shell
-x (execute step-by-step, used for debugging shell scripts)
vi Editor
Entering vi
vi file Edits the file named file
vi file file2 Edit files consecutively (through :n)
.exrc File that contains the vi profile
wm=nn Sets wrap margin to nn. Can enter a file other than at first line
by adding + (last line), +n (line n), or +/pattern (first occurrence
of pattern).
vi -r Lists saved files
vi -r file Recover file named file from crash
:n Next file in stack
:set all Show all options

V7.0.1
Instructor Guide
AP :set nu Display line numbers (off when set nonu)

:set list Display control characters in file
:set wm=n Set wrap margin to n
:set showmode Sets display of "INPUT" when in input mode
Read, write, exit

:w Write buffer contents
:w file2 Write buffer contents to file2
:w >> file2 Write buffer contents to end of file2
:q Quit editing session
:q! Quit editing session and discard any changes
:r file2 Read file2 contents into buffer following current cursor
:r! com Read results of shell command com following current cursor
:! Exit shell command (filter through command)
:wq or ZZ Write and quit edit session
Units of measure
h, l Character left, character right
k or <Ctrl>p Move cursor to character above cursor
j or <Ctrl>n Move cursor to character below cursor
w, b Word right, word left
^, $ Beginning, end of current line
<CR> or + Beginning of next line
- Beginning of previous line
G Last line of buffer
Cursor movements
Can precede cursor movement commands (including cursor arrow) with number of times to
repeat, for example, 9--> moves right nine characters.
0 Move to first character in line
$ Move to last character in line
^ Move to first nonblank character in line

Instructor Guide
fx Move right to character x

Fx Move left to character x
tx Move right to character preceding character x
Tx Move left to character preceding character x
; Find next occurrence of x in same direction
, Find next occurrence of x in opposite direction
w Tab word (nw = n tab word) (punctuation is a word)
W Tab word (nw = n tab word) (ignore punctuation)
b Backtab word (punctuation is a word)
B Backtab word (ignore punctuation)
e Tab to ending char. of next word (punctuation is a word)
E Tab to ending char. of next word (ignore punctuation)
( Move to beginning of current sentence
) Move to beginning of next sentence
{ Move to beginning of current paragraph
} Move to beginning of next paragraph
H Move to first line on screen
M Move to middle line on screen
L Move to last line on screen
<Ctrl>f Scroll forward 1 screen (3 lines overlap)
<Ctrl>d Scroll forward 1/2 screen
<Ctrl>b Scroll backward 1 screen (0 line overlap)
<Ctrl>u Scroll backward 1/2 screen
G Go to last line in file
nG Go to line n
<Ctrl>g Display current line number
Search and replace

/pattern Search forward for pattern
?pattern Search backward for pattern
n Repeat find in the same direction
N Repeat find in the opposite direction

V7.0.1
Instructor Guide
AP Adding text
a Add text after the cursor (end with <esc>)
A Add text at end of current line (end with <esc>)
i Add text before the cursor (end with <esc>)
I Add text before first nonblank character in current line
o Add line following current line
O Add line before current line
<esc> Return to command mode
Deleting text
<Ctrl>w Undo entry of current word
@ Kill the insert on this line
x Delete current character
dw Delete to end of current word (observe punctuation)
dW Delete to end of current word (ignore punctuation)
dd Delete current line
d Erase to end of line (same as d$)
d) Delete current sentence
d} Delete current paragraph
dG Delete current line through end of buffer
d^ Delete to the beginning of line
u Undo last change command
U Restore current line to original state before modification
Replacing text
ra Replace current character with a
R Replace all characters overtyped until <esc> is entered
s Delete current character and append test until <esc>
s/s1/s2 Replace s1 with s2 (in the same line only)
S Delete all characters in the line and append text
cc Replace all characters in the line (same as S)

Instructor Guide
ncx Delete n text objects of type x, w, b = words,) = sentences, } =

paragraphs, $ = end-of-line, ^ = beginning of line) and enter
append mode
C Replace all characters from cursor to end-of-line
Moving text
p Paste last text deleted after cursor (xp will transpose 2
characters)
P Paste last text deleted before cursor
nYx Yank n text objects of type x (w, b = words,) = sentences, } =
paragraphs, $ = end-of-line, and no "x" indicates lines. Can
then paste them with p command. Yank does not delete the
original.
"ayy" Can use named registers for moving, copying, cut/paste with
"ayy" for register a (use registers a-z), can then paste them with
ap command.
Miscellaneous
. Repeat last command
J Join current line with next line

V7.0.1
Instructor Guide
AP Appendix C. AIX dump code and progress codes

This appendix is an extract out of the AIX 4.3 Messages Guide and Reference.
0c0 - 0cc
0c0 A user-requested dump completed successfully.
0c1 An I/O error occurred during the dump.
0c2 A user-requested dump is in progress. Wait at least one minute for the
dump to complete.
0c4 The dump ran out of space. Partial dump is available.
0c5 The dump failed due to an internal failure. A partial dump may exist.
0c7 Progress indicator. Remote dump is in progress.
0c8 The dump device is disabled. No dump device configured.
0c9 A system-initiated dump has started. Wait at least one minute for the
dump to complete.
0cc (AIX 4.2.1 and later) An error occurred writing to the primary dump
device. It switched over to the secondary.
100 - 195
100 Progress indicator. BIST completed successfully.
101 Progress indicator. Initial BIST started following system reset.
102 Progress indicator. BIST started following power on reset.
103 BIST could not determine the system model number.
104 BIST could not find the common on-chip processor bus address.
105 BIST could not read from the on-chip sequencer EPROM.
106 BIST detected a module failure.
111 On-chip sequencer stopped. BIST detected a module error.
112 Checkstop occurred during BIST and checkstop results could not be
logged out.
113 The BIST checkstop count equals 3, that means three unsuccessful
system restarts. System halts.
120 Progress indicator. BIST started CRC check on the EPROM.
121 BIST detected a bad CRC on the on-chip sequencer EPROM.
© Copyright IBM Corp. 2009, 2012 Appendix C. AIX dump code and progress codes C-1
Instructor Guide
122 Progress indicator. BIST started a CRC check on the EPROM.

123 BIST detected a bad CRC on the on-chip sequencer NVRAM.
124 Progress indicator. BIST started a CRC check on the on-chip
sequencer NVRAM.
125 BIST detected a bad CRC on the time-of-day NVRAM.
126 Progress indicator. BIST started a CRC check on the time-of-day
NVRAM.
127 BIST detected a bad CRC on the EPROM.
130 Progress indicator. BIST presence test has started.
140 BIST was unsuccessful. The system halts.
143 Invalid memory configuration
151 Progress indicator. BIST has started.
152 Progress indicator. BIST has started direct-current logic self-test
(DCLST) code.
153 Progress indicator. BIST has started.
154 Progress indicator. BIST has started array self-test (AST) test code.
160 BIST detected a missing early power-off warning (EPOW) connector.
161 The Bump quick I/O tests failed.
162 The JTAG tests failed.
164 BIST encountered an error while reading low NVRAM.
165 BIST encountered an error while writing low NVRAM.
166 BIST encountered an error while reading high NVRAM.
167 BIST encountered an error while writing high NVRAM.
168 BIST encountered an error while reading the serial input/output
register.
169 BIST encountered an error while writing the serial input/output register.
180 Progress indicator. The BIST checkstop logout is in progress.
182 BIST COP bus is not responding.
185 Checkstop occurred during BIST.
186 System logic-generated checkstop (Model 250 only).
C-2 AIX Advanced Administration © Copyright IBM Corp. 2009, 2012

V7.0.1
Instructor Guide
AP 187 BIST was unable to identify the chip release level in the checkstop
logout data.
195 Progress indicator. The BIST checkstop logout completed.
200 - 299, 2e6-2e7

200 Key mode switch is in the secure position.
201 Checkstop occurred during system restart. If a 299 LED was shown
before, recreate the boot logical volume (bosboot).
202 Unexpected machine check interrupt, system halts
203 Unexpected data storage interrupt, system halts
204 Unexpected instruction storage interrupt, system halts
205 Unexpected external interrupt, system halts
206 Unexpected alignment interrupt, system halts
207 Unexpected program interrupt, system halts
208 Machine check due to an L2 uncorrectable ECC, system halts
209 Reserved, system halts
210 Unexpected switched virtual circuit (SVC) 1000 interrupt, system halts
211 IPL ROM CRC miscompare occurred during system restart, system
halts
212 POST found processor to be bad, system halts
213 POST failed. No good memory could be detected, the system halts.
214 An I/O planar failure has been detected. The power status register, the
time-of-day clock, or NVRAM on the I/O planar failed. The system halts
215 Progress indicator. The level of voltage supplied to the system is too
low to continue a system restart.
216 Progress indicator. The IPL ROM code is being uncompressed into
memory for execution.
217 Progress indicator. The system has encountered the end of the boot
devices list. The system continues to loop through the boot devices list.
218 Progress indicator. POST is testing for 1MB of good memory.
219 Progress indicator. POST bit map is being generated.
21c L2 cache not detected as part of systems configuration (when LED
persists for 2 seconds).
220 Progress indicator. IPL control block is being initialized.
Instructor Guide
221 An NVRAM CRC miscompare occurred while loading the operating

system with the key mode switch in Normal position. System halts.
222 Progress indicator. Attempting a Normal-mode system restart from the
standard I/O planar-attached devices. System retries.
SCSI-attached devices specified in the NVRAM list.
9333 High-Performance Disk Drive Subsystem.
bus-attached internal disk.
226 Progress indicator. Attempting a Normal-mode system restart from
Ethernet.
token ring.
228 Progress indicator. Attempting a Normal-mode system restart using the
expansion code devices list, but cannot restart from any of the devices
in the list.
devices in NVRAM boot devices list, but cannot restart from any of the
devices in the list. System retries.
22c Progress indicator. Attempting a Normal-mode IPL from FDDI specified
in the NVRAM device list.
Family 2 Feature ROM specified in the IPL ROM default devices list.
Ethernet specified by selection from ROM menus.
standard I/O planar-attached devices specified in the IPL ROM default
device list.
SCSI-attached devices specified in the IPL ROM default device list.
9333 High-Performance Disk Drive Subsystem specified in the IPL
ROM default device list.
9333 High-Performance Disk Drive Subsystem specified in the IPL
ROM default device list.

V7.0.1
Instructor Guide
AP 235 Progress indicator. Attempting a Normal-mode system restart from the

bus-attached internal disk specified in the IPL ROM default device list.
Ethernet specified in the IPL ROM default device list.
token ring specified in the IPL ROM default device list.
token-ring specified by selection from ROM menus.
239 Progress indicator. A Normal-mode menu selection failed to boot.
23c Progress indicator. Attempting a Normal-mode IPL form FDDI in IPL
ROM device list.
240 Progress indicator. Attempting a Service-mode system restart from the
Family 2 Feature ROM specified in the NVRAM boot devices list.
241 Attempting a Normal-mode system restart from devices specified in
NVRAM bootlist.
standard I/O planar-attached devices specified in the NVRAM boot
devices list.
SCSI-attached devices specified in the NVRAM boot devices list.
9333 High-Performance Disk Drive Subsystem specified in the
NVRAM boot devices list.
bus-attached internal disk specified in the NVRAM boot devices list.
Ethernet specified in the NVRAM boot devices list.
Token-Ring specified in the NVRAM boot devices list.
248 Progress indicator. Attempting a Service-mode system restart using
the expansion code specified in the NVRAM boot devices list.
249 Progress indicator. Attempting a Service-mode system restart from
devices in NVRAM boot devices list, but cannot restart from any of the
devices in the list.
Family 2 Feature ROM specified in the IPL ROM default devices list.
Ethernet by selection from ROM menus.
Instructor Guide

standard I/O planar-attached devices specified in the IPL ROM default
devices list.
SCSI-attached devices specified in the IPL ROM default devices list.
9333 High-Performance Subsystem devices specified in the IPL ROM
default devices list.
bus-attached internal disk specified in the IPL ROM default devices list.
Ethernet specified in the IPL ROM default devices list.
token ring specified in the IPL ROM default devices list.
token ring specified by selection from ROM menus.
FDDI specified by the operator.
260 Progress indicator. Menus are being displayed on the local display or
terminal connected to your system. The system waits for input from the
terminal.
261 No supported local system display adapter was found. The system
waits for a response from an asynchronous terminal on serial port 1.
262 No local system keyboard was found.
Family 2 Feature ROM specified in the NVRAM boot devices list.
269 Progress indicator. Cannot boot system, end of bootlist reached.
270 Progress indicator. Ethernet/FDX 10 Mbps MC adapter POST is
running.
271 Progress indicator. Mouse and mouse port POST are running.
272 Progress indicator. Tablet port POST is running.
276 Progress indicator. A 10/100 Mbps Ethernet MC adapter POST is
running.
277 Progress indicator. Auto Token Ring LAN streamer MC 32 adapter
POST is running.
278 Progress indicator. Video ROM scan POST is running.
279 Progress indicator. FDDI POST is running

V7.0.1
Instructor Guide
AP 280 Progress indicator. 3Com Ethernet POST is running.

281 Progress indicator. Keyboard POST is running.
282 Progress indicator. Parallel port POST is running.
283 Progress indicator. Serial port POST is running.
284 Progress indicator. POWER Gt1 graphics adapter POST is running.
286 Progress indicator. Token Ring adapter POST is running.
287 Progress indicator. Ethernet adapter POST is running.
288 Progress indicator. Adapter slot cards are being queried.
290 Progress indicator. I/O planar test started.
291 Progress indicator. Standard I/O planar POST is running.
292 Progress indicator. SCSI POST is running.
293 Progress indicator. Bus-attached internal disk POST is running.
294 Progress indicator. TCW SIMM in slot J is bad.
295 Progress indicator. Color Graphics Display POST is running.
296 Progress indicator. Family 2 Feature ROM POST is running.
297 Progress indicator. System model number could not be determined.
System halts.
298 Progress indicator. Attempting a warm system restart.
299 Progress indicator. IPL ROM passed control to loaded code.
2e6 Progress indicator. A PCI Ultra/Wide differential SCSI adapter is being
configured.
2e7 An undetermined PCI SCSI adapter is being configured.
500 - 599, 5c0 - 5c6

500 Progress indicator. Querying standard I/O slot.
501 Progress indicator. Querying card in slot 1.
Instructor Guide

510 Progress indicator. Starting device configuration.
511 Progress indicator. Device configuration completed.
512 Progress indicator. Restoring device configuration from media.
513 Progress indicator. Restoring BOS installation files from media.
516 Progress indicator. Contacting server during network boot.
517 Progress indicator. The / (root) and /usr file systems are being
mounted.
518 Mount of the /usr file system was not successful. System halts.
520 Progress indicator. BOS configuration is running.
521 The /etc/inittab file has been incorrectly modified or is damaged.
The configuration manager was started from the /etc/inittab file
with invalid options. System halts.
The configuration manager was started from the /etc/inittab file with
conflicting options. System halts.
523 The /etc/objrepos file is missing or inaccessible.
524 The /etc/objrepos/Config_Rules file is missing or inaccessible.
525 The /etc/objrepos/CuDv file is missing or inaccessible.
526 The /etc/objrepos/CuDvDr file is missing or inaccessible.
527 You cannot run Phase 1 at this point. The /sbin/rc.boot file has
probably been incorrectly modified or is damaged.
528 The /etc/objrepos/Config_Rules file has been incorrectly
modified or is damaged, or a program specified in the file is missing.
529 There is a problem with the device containing the ODM database or
the root file system is full.
530 The savebase command was unable to save information about the
base customized devices onto the boot device during Phase 1 of
system boot. System halts.
531 The /usr/lib/objrepos/PdAt file is missing or inaccessible.
System halts.
532 There is not enough memory for the configuration manager to
continue. System halts.

V7.0.1
Instructor Guide
AP 533 The /usr/lib/objrepos/PdDv file has been incorrectly modified or

is damaged, or a program specified in the file is missing.
534 The configuration manager is unable to acquire a database lock.
System halts.
535 A HIPPI diagnostics interface driver is being configured.
modified or is damaged. System halts.
modified or is damaged. System halts.
538 Progress indicator. The configuration manager is passing control to a
configuration method.
539 Progress indicator. The configuration method has ended and control
has returned to the configuration manager.
540 Progress indicator. Configuring child of IEEE-1284 parallel port.
544 Progress indicator. An ECP peripheral configure method is executing.
545 Progress indicator. A parallel port ECP device driver is being
configured.
546 IPL cannot continue due to an error in the customized database.
547 Rebooting after error recovery (LED 546 precedes this LED).
548 Restbase failure.
549 Console could not be configured for the “Copy a System Dump” menu.
550 Progress indicator. ATM LAN emulation device driver is being
configured.
551 Progress indicator. A varyon operation of the rootvg is in progress.
552 The ipl_varyon command failed with a return code not equal to 4, 7,
8 or 9 (ODM or malloc failure). System is unable to vary on the rootvg.
Phase 1 boot is completed and the init command started.
554 The IPL device could not be opened or a read failed (hardware not
configured or missing).
555 The fsck -fp /dev/hd4 command on the root file system failed
with a non-zero return code.
556 LVM subroutine error from ipl_varyon.
557 The root file system could not be mounted. The problem is usually due
to bad information on the log logical volume (/dev/hd8) or the boot
logical volume (hd5) has been damaged.
Instructor Guide
558 Not enough memory is available to continue system restart.

559 Less than 2 MB of good memory are left for loading the AIX kernel.
System halts.
560 Unsupported monitor is attached to the display adapter.
561 Progress indicator. The TMSSA device is being identified or
configured.
565 Configuring the MWAVE subsystem.
566 Progress indicator. Configuring Namkan twinaxx common card.
567 Progress indicator. Configuring High-Performance Parallel Interface
(HIPPI) device driver (fpdev).
568 Progress indicator. Configuring High-Performance Parallel Interface
(HIPPI) device driver (fphip).
569 Progress indicator. FCS SCSI protocol device is being configured.
570 Progress indicator. A SCSI protocol device is being configured.
571 HIPPI common functions driver is being configured.
572 HIPPI IPI-3 master mode driver is being configured.
573 HIPPI IPI-3 slave mode driver is being configured.
574 HIPPI IPI-3 user-level interface is being configured.
575 A 9570 disk-array driver is being configured.
576 Generic async device driver is being configured.
577 Generic SCSI device driver is being configured.
578 Generic common device driver is being configured.
579 Device driver is being configured for a generic device.
580 Progress indicator. A HIPPI-LE interface (IP) layer is being configured.
581 Progress indicator. TCP/IP is being configured. The configuration
method for TCP/IP is being run.
582 Progress indicator. Token ring data link control (DLC) is being
configured.
583 Progress indicator. Ethernet data link control (DLC) is being
configured.
584 Progress indicator. IEEE Ethernet (802.3) data link control (DLC) is
being configured.
585 Progress indicator. SDLC data link control (DLC) is being configured.
586 Progress indicator. X.25 data link control (DLC) is being configured.

V7.0.1
Instructor Guide
AP 587 Progress indicator. Netbios is being configured.

588 Progress indicator. Bisync read-write (BSCRW) is being configured.
589 Progress indicator. SCSI target mode device is being configured.
590 Progress indicator. Diskless remote paging device is being configured.
591 Progress indicator. Logical Volume Manager device driver is being
configured.
592 Progress indicator. An HFT device is being configured.
593 Progress indicator. SNA device driver is being configured.
594 Progress indicator. Asynchronous I/O is being defined or configured.
595 Progress indicator. X.31 pseudo device is being configured.
596 Progress indicator. SNA DLC/LAPE pseudo device is being configured.
597 Progress indicator. Outboard communication server (OCS) is being
configured.
598 Progress indicator. OCS hosts is being configured during system
reboot.
599 Progress indicator. FDDI data link control (DLC) is being configured.
5c0 Progress indicator. Streams-based hardware driver being configured.
5c1 Progress indicator. Streams-based X.25 protocol stack being
configured.
5c2 Progress indicator. Streams-based X.25 COMIO emulator driver being
configured.
5c3 Progress indicator. Streams-based X.25 TCP/IP interface driver being
configured.
5c4 Progress indicator. FCS adapter device driver being configured.
5c5 Progress indicator. SCB network device driver for FCS is being
configured.
5c6 Progress indicator. AIX SNA channel being configured.
c00 - c99
c00 AIX Install/Maintenance loaded successfully.
c01 Insert the AIX Install/Maintenance diskette.
c02 Diskettes inserted out of sequence.
c03 Wrong diskette inserted.
c04 Irrecoverable error occurred.
Instructor Guide
c05 Diskette error occurred.

c06 The rc.boot script is unable to determine the type of boot.
c07 Insert next diskette.
c08 RAM file system started incorrectly.
c09 Progress indicator. Writing to or reading from diskette.
c10 Platform-specific bootinfo is not in boot image.
c20 Unexpected system halt occurred. System is configured to enter the
kernel debug program instead of performing a system dump. Enter
bosboot -D for information about kernel debugger enablement.
c21 The if config command was unable to configure the network for the
client network host.
c25 Client did not mount remote mini root during network install.
c26 Client did not mount the /usr file system during the network boot.
c29 System was unable to configure the network device.
c31 If a console has not been configured, the system pauses with this
value and then displays instructions for choosing a console.
c32 Progress indicator. Console is a high-function terminal.
c33 Progress indicator. Console is a tty.
c34 Progress indicator. Console is a file.
c40 Extracting data files from media.
c41 Could not determine the boot type or device.
c42 Extracting data files from diskette.
c43 Could not access the boot or installation tape.
c44 Initializing installation database with target disk information.
c45 Cannot configure the console. The cfgcon command failed.
c46 Normal installation processing.
c47 Could not create a PVID on a disk. The chgdisk command failed.
c48 Prompting you for input. BosMenus is being run.
c49 Could not create or form the JFS log.
c50 Creating rootvg on target disk.
c51 No paging devices were found.
c52 Changing from RAM environment to disk environment.

V7.0.1
Instructor Guide
AP c53 Not enough space in /tmp to do a preservation installation. Make /tmp

larger.
c54 Installing either BOS or additional packages.
c55 Could not remove the specified logical volume in a preservation
installation.
c56 Running user-defined customization.
c57 Failure to restore BOS.
c58 Displaying message to turn the key.
c59 Could not copy either device special files, device ODM, or volume
group information from RAM to disk.
c61 Failed to create the boot image.
c70 Problem mounting diagnostic CD-ROM disk in stand-alone mode.
c99 Progress indicator. The diagnostic programs have completed.
Instructor Guide

V7.0.1
backpg
Back page

Training AN15 - AIX III - Advanced Administration and Problem Determination - Instructor Guide - 2012

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Training AN15 - AIX III - Advanced Administration and Problem Determination - Instructor Guide - 2012

Hochgeladen von

Copyright:

Verfügbare Formate

V7.0.

Power Systems for AIX III:

(Course code AN15)

November 2012 edition

© Copyright International Business Machines Corporation 2009, 2012.

Instructor course overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii

Unit 1. Advanced AIX administration overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1

Unit 2. The Object Data Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1

© Copyright IBM Corp. 2009, 2012 Contents iii

Software vital product data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-40

Unit 3. Error monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-1

Unit 4. Network Installation Manager basics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-1

iv AIX Advanced Administration © Copyright IBM Corp. 2009, 2012

TOC NIM configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-22

Unit 5. System initialization: Accessing a boot image . . . . . . . . . . . . . . . . . . . . . . . 5-1

Unit 6. System initialization: rc.boot and inittab . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-1

© Copyright IBM Corp. 2009, 2012 Contents v

Unit 7. LVM metadata and related problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-1

vi AIX Advanced Administration © Copyright IBM Corp. 2009, 2012

TOC ODM-related LVM problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-72

Unit 8. Disk management procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-1

Unit 9. Install and cloning techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-1

© Copyright IBM Corp. 2009, 2012 Contents vii

Exercise: Alternate disk install . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-26

Unit 10. Advanced backup techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-1

viii AIX Advanced Administration © Copyright IBM Corp. 2009, 2012

TOC Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-77

Unit 11. Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-1

Unit 12. The AIX system dump facility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-1

© Copyright IBM Corp. 2009, 2012 Contents ix

Appendix A. Checkpoint solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-1

Appendix B. Command summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-1

Appendix C. AIX dump code and progress codes . . . . . . . . . . . . . . . . . . . . . . . . . . C-1

x AIX Advanced Administration © Copyright IBM Corp. 2009, 2012

© Copyright IBM Corp. 2009, 2012 Trademarks xi

xii AIX Advanced Administration © Copyright IBM Corp. 2009, 2012

pref Instructor course overview

© Copyright IBM Corp. 2009, 2012 Instructor course overview xiii

xiv AIX Advanced Administration © Copyright IBM Corp. 2009, 2012

pref Course description

© Copyright IBM Corp. 2009, 2012 Course description xv

These skills could be developed through experience or by formal

xvi AIX Advanced Administration © Copyright IBM Corp. 2009, 2012

pref • Disk management procedures

© Copyright IBM Corp. 2009, 2012 Course description xvii

xviii AIX Advanced Administration © Copyright IBM Corp. 2009, 2012

Day 1 (est 5:15)

Day 2 (est 5:30)

Day 3 (est 5:35)

© Copyright IBM Corp. 2009, 2012 Agenda xix

Day 4 (est 5:35)

Day 5 (est 4:00)

xx AIX Advanced Administration © Copyright IBM Corp. 2009, 2012

Uempty Unit 1. Advanced AIX administration overview

What this unit is about

What you should be able to do

How you will check your progress

After completing this unit, you should be able to:

© Copyright IBM Corporation 2012

Figure 1-1. Unit objectives AN152.2

1-2 AIX Advanced Administration © Copyright IBM Corp. 2009, 2012

Uempty Instructor notes:

Figure 1-2. Application outages AN152.2

1-4 AIX Advanced Administration © Copyright IBM Corp. 2009, 2012

Minimize time needed for tasks

Operating system maintenance

Keep system documentation current

Keep maintenance up to date

Use a problem determination methodology

Effective problem determination starts with a good

The more information you have about the normal operation

System reference codes (SRCs)

Service request numbers (SRNs)

Have the needed information ready: