Beruflich Dokumente
Kultur Dokumente
cover
Front cover
Instructor Guide
ERC 2.2
Instructor Guide
Trademarks
The reader should recognize that the following terms, which appear in the content of this
training document, are official trademarks of IBM or other companies:
IBM® is a registered trademark of International Business Machines Corporation.
The following are trademarks of International Business Machines Corporation in the United
States, or other countries, or both:
AIX 5L™ AIX 6™ AIX®
AS/400® DB2® DS8000®
HACMP™ Initiate® MWAVE®
Power Systems™ Power® PowerVM®
POWER6® POWER7® pSeries®
Redbooks® RS/6000® System p®
Tivoli®
Intel is a trademark or registered trademark of Intel Corporation or its subsidiaries in the
United States and other countries.
Windows is a trademark of Microsoft Corporation in the United States, other countries, or
both.
UNIX is a registered trademark of The Open Group in the United States and other
countries.
VMware and the VMware "boxes" logo and design, Virtual SMP and VMotion are registered
trademarks or trademarks (the "Marks") of VMware, Inc. in the United States and/or other
jurisdictions.
Other product and service names might be trademarks of IBM or other companies.
TOC Contents
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Course description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Agenda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix
rc.boot 3 (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-18
rc.boot 3 (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . 6-21
rc.boot summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-24
Fixing corrupted file systems and logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-26
Let’s review: rc.boot (1 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-29
Let’s review: rc.boot (2 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-31
Let’s review: rc.boot (3 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-33
6.2. AIX initialization part 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-35
Configuration manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-36
Config_Rules object class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-39
cfgmgr output in the boot log using alog , . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-42
/etc/inittab file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-45
Boot problem management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-49
Let's review: /etc/inittab file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-53
Checkpoint (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-58
Checkpoint (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-60
Exercise: System initialization: rc.boot and inittab . . . . . . . . . . . . . . . . . . . . . . . . .6-62
Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-64
TMK Trademarks
The reader should recognize that the following terms, which appear in the content of this
training document, are official trademarks of IBM or other companies:
IBM® is a registered trademark of International Business Machines Corporation.
The following are trademarks of International Business Machines Corporation in the United
States, or other countries, or both:
AIX 5L™ AIX 6™ AIX®
AS/400® DB2® DS8000®
HACMP™ Initiate® MWAVE®
Power Systems™ Power® PowerVM®
POWER6® POWER7® pSeries®
Redbooks® RS/6000® System p®
Tivoli®
Intel is a trademark or registered trademark of Intel Corporation or its subsidiaries in the
United States and other countries.
Windows is a trademark of Microsoft Corporation in the United States, other countries, or
both.
UNIX is a registered trademark of The Open Group in the United States and other
countries.
VMware and the VMware "boxes" logo and design, Virtual SMP and VMotion are registered
trademarks or trademarks (the "Marks") of VMware, Inc. in the United States and/or other
jurisdictions.
Other product and service names might be trademarks of IBM or other companies.
Duration: 5 days
Purpose
This course provides advanced AIX system administrator skills with a
focus on availability and problem determination. It provides detailed
knowledge of the ODM database where AIX maintains so much
configuration information. It shows how to monitor for and deal with
AIX problems. There is special focus on dealing with Logical Volume
Manager problems, including procedures for replacing disks. Several
techniques for minimizing the system maintenance window are
covered. While the course includes some AIX 7.1 enhancements,
most of the material is applicable to prior releases of AIX.
Audience
This is an advanced course for AIX system administrators, system
support, and contract support individuals with at least six months of
experience in AIX.
Prerequisites
You should have basic AIX System Administration skills. These skills
include:
• Use of the Hardware Management Console (HMC) to activate a
logical partition running AIX and to access the AIX system console
• Install an AIX operating system from an already configured NIM
server
• Implementation of AIX backup and recovery
• Manage additional software and base operating system updates
• Familiarity with management tools such as SMIT
• Understand how to manage file systems, logical volumes, and
volume groups
• Mastery of the UNIX user interface including use of the vi editor,
command execution, input and output redirection, and the use of
utilities such as grep
Objectives
On completion of this course, students should be able to:
• Perform system problem determination and reporting procedures
including analyzing error logs, creating dumps of the system, and
providing needed data to the AIX Support personnel
• Examine and manipulate Object Data Manager databases
• Identify and resolve conflicts between the Logical Volume Manager
(LVM) disk structures and the Object Data Manager (ODM)
• Complete a very basic configuration of Network Installation
Manager to provide network boot support for either system
installation or booting to maintenance mode
• Identify various types of boot and disk failures and perform the
matching recovery procedures
• Implement advanced methods such as alternate disk install,
multibos, and JFS2 snapshots to use a smaller maintenance
window
Contents
• Advanced AIX administration overview
• The Object Data Manager
• Error monitoring
• Network Installation Manager basics
• System initialization: Accessing a boot image
• System initialization: rc.boot and inittab
• LVM metadata and related problems
pref Agenda
The estimated timings provided here are for content only. It assumes the remainder of
the day is consumed with hourly breaks and a one hour lunch break. Most days are
timed to allow class dismissal between 4 p.m. and 4:30 p.m. assuming a 9 a.m. to 5
p.m. class day. If the class runs quicker than expected, some days have an optional lab
for the students to play with, which will help fill in the time.
Estimated time
00:55
References
SG24-7910 IBM AIX Version 7.1 Differences Guide (Redbook)
SG24-7559 IBM AIX Version 6.1 Differences Guide (Redbook)
SG24-5496 Problem Solving and Troubleshooting in AIX 5L
(Redbook)
© Copyright IBM Corp. 2009, 2012 Unit 1. Advanced AIX administration overview 1-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Unit objectives
IBM Power Systems
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 1. Advanced AIX administration overview 1-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Application outages
IBM Power Systems
Functional or performance
Avoid unplanned outages with best practices
Change control
Data security
Capacity planning
High availability design
Avoid planned outages
Fall-over to backup server
Relocate application (LPAR or WPAR mobility)
Use maintenance windows
Application stopped versus slow activity
Plan enough time for back-out or recovery
Minimize time needed
Effective problem determination and recovery
© Copyright IBM Corporation 2012
Notes:
Introduction
Providing system availability is a major responsibility of any system administrator. An
outage may be caused by a functional problem (such as an application or system crash)
or a server performance problem (business is seriously impacted due to poor response
times or late jobs). There are many approaches to dealing with this.
Unplanned outages
When most of us think of availability, we think of unplanned outages. Regular hardware
and software maintenance can often avoid these outages. Designing the computing
facility to have redundant components (power, network adapters, network switches,
storage, and more) can make the overall system resilient to the failure of individual
components. Performance problems are often the result of failing to do proper capacity
planning, resulting in not enough resources (memory, processors, network bandwidth,
or disk I/O bandwidth) to handle the increased workload. If there is no change control to
Uempty manage what work is placed on a system, capacity planning is even more challenging.
Furthermore, uncontrolled changes to a system result in uncontrolled exposure to
possible outages created by those changes, an thus unplanned outages. Computer
viruses and other malicious attacks by computer hackers can also reduce system
availability (in addition to the exposure of losing proprietary information). Good data
security policies are essential.
Even when implementing good policies in these areas, some unplanned outages will
still happen. In these situations, the system administrator needs to have a plan for
minimizing the impact and recovering as quickly as possible. One common approach is
to have an alternate system that can take over the work of the failed system. High
Availability Cluster Multi-Processing (HACMP) provides a system for either concurrent
processing by multiple systems, or an automated fall-over to a backup system, thus
minimizing the impact of a server failure. Such server redundancy can be designed to
work within a single facility or be divided between different geographical locations.
Obviously, rapid notification of a problem, effective and prompt diagnosis of the cause,
and being able to quickly implement an effective solution will all contribute to a smaller
mean time to recovery.
Planned outages
By using change control, the risk associated with certain categories of potential
unplanned outages can be managed by implementing the changes during planned
windows of time when the impact of any unexpected problem (resulting from the
change) is minimized. In addition, there are certain types of changes for which an
outage is unavoidable.
Some facilities will implement multiple types of maintenance windows. One type would
be frequent short maintenance windows for any administrative work that will compete
with applications for resources (performance impact) or have a small chance of having
a functional disruption. Another type would be a less frequent window in which any
reboot of the system or any major change to the level of the operating system or major
subsystems, such as database software, would be allowed.
Sometimes, the amount of time in a maintenance window is relatively small and the
work has to be carefully planned. You also need to allow time to recover if any thing
goes wrong due to the maintenance. Any needed resources that can be pre-staged will
help expedite the work. Any approach that can speed recovery after a problem occurs is
also useful.
For systems which need to be up 24 hours a day, seven days a week, and every day in
the year (24x7x365), even a short outage cannot be tolerated. In those situation, a
method to non-disruptively move the applications to another system can be invaluable.
If an HACMP cluster solution is already in place to handle unplanned outages, then this
can be used to manually fall-over the services to another system while maintenance is
being done. Other solutions are to use Live Partition Mobility or Live Application
Mobility.
© Copyright IBM Corp. 2009, 2012 Unit 1. Advanced AIX administration overview 1-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Provide an overview of issues related to application availability.
Details —
Additional information —
Transition statement — Let’s look at some factors that affect the amount of time needed
in a maintenance window.
Uempty
System backups
Minimizing rootvg size
Snapshot techniques for user file systems
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 1. Advanced AIX administration overview 1-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
An important technique, that we will cover, is the use of an alternate storage for the
target of the software update. What we mean is that the updates are not made to the
rootvg, but rather to a copy of the rootvg. This has two advantages. First, there is no
change being made to the active rootvg. For locations that make a distinction between
changing the level of the operating system and simply doing work that has a
performance impact, the actual time consuming update activity can be done in a more
frequently available window. Then when a major maintenance window arrives, you only
need to reboot to make it effective. The second advantage, and to some the more
important advantage, is the ease of recovery. If you find that there are serious problems
with running under the new level of code, you only need to reboot back to the earlier
code level, rather than recover from a mksysb or reject the entire update. Of course, the
down side is that you will need to reboot to make the update effective; but, this is
something a major maintenance window should expect.
There are two techniques that we will cover. One technique, is creating an alternate set
of logical volumes that are copies of the rootvg BOS logical volumes. This is called
multibos. The other technique, is creating an alternate volume group which is a clone of
the rootvg. In each case, you would apply the maintenance to the copy and then later
reboot to make it effective.
System backups
Another common maintenance activity is backing up the system. Unless you have an
application that is designed to manage a recovery process using fuzzy backups, you
will need to quiesce the application activity long enough to be sure that there are no
inconsistencies in the backup. The term fuzzy backup refers to a backup in which the
application was making changes during the backup. For a given transaction, multiple
data changes are made. Some of these transaction related changes are made before
that data was backed up, while other changes were made after that data was backed
up. Thus the backup has one piece of data which reflects the transaction and another
piece of data that does not reflect the transaction. The two pieces of data are
inconsistent and such a backup is referred to as fuzzy.
For the rootvg itself, the size of the rootvg should be minimized. It should only contain
what is needed for the OS. All user data and other non-essential files should be backed
up and restored separately. An example would be the standard location of a software
repository: /usr/sys/inst.images. The software repository can be very large and yet
this common path resides in the /usr file system, which is in the rootvg. Placing the
software repository in a separate file system with its own recovery plan (could be using
the original media as the backup) can help reduce backup and recovery time. Another
common example is the /home file system. If users have vast amounts of data stored
there, then over mounting with a separate file system can again speed up working with
the rootvg. There other file systems such as /tmp that could have contents be
eliminated from the system backup.The trick is that these would need to be excluded
(not mounted or identified in /etc/exclude.rootvg) from the backup during mksysb
Uempty execution, and then separately recovered from their own backup. Other user data will
be in separate user volume groups.
With the emphasis on separate backups for non-BOS data, there comes a need to
minimize how long the applications need to be quiesced and still have data consistency.
One technique that AIX provides is JFS2 snapshots, which will allow us to only very
briefly quiesce the application and still have a consistent picture of the data at a single
point in time. Then we can either use that snapshot of the data as its own backup, or
base an actual backup upon that snapshot (in order to have off-site storage of the
backup). There other facilities for doing snapshot captures of data. Some are part of the
storage subsystems and some are part of total storage solutions such as Tivoli Storage
Manager. Our focus will be on the facility that is provided with AIX, JFS2 snapshot.
© Copyright IBM Corp. 2009, 2012 Unit 1. Advanced AIX administration overview 1-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Cover approaches that reduce maintenance time.
Details —
Additional information —
Transition statement — Actual hardware or software problems are also a concern for
application availability. What do we need to do to better manage problems when they
occur?
Uempty
If an AIX bug:
Collect problem information
Open problem report with AIX Support
Provide snap with information
Notes:
System maintenance
Sometimes code works well under normal testing or production circumstances, but can
have a poor logic discovered when faced with an unanticipated situation. Alternatively, it
could be some non-central aspect of the code that is not noticed normally. The number
of facilities using this code is large enough that there is a good chance that one of the
facilities will detect and report the problem not long after release of the new code level.
© Copyright IBM Corp. 2009, 2012 Unit 1. Advanced AIX administration overview 1-11
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
The fix for the code defect will usually come out in the next released fix pack. On the
other hand, many facilities may not be effected by or be concerned about the code
defect problem for months, until the circumstances arise in which it represents a
problem. By installing newer service packs, a facility can benefit from the experience of
others and avoid being impacted by known problems.
Obviously there is always the possible exposure that a new fix pack will introduce new
problems, while solving many old problems.
This course will cover some techniques to use in applying fix packs.
Problem determination
Once you find yourself impacted by what you believe to be a product defect, you will
need to obtain prompt resolution. While there is no substitute for experience (the ability
to recognize a situation and remember the details of how you dealt with it the last time a
similar problem occurred), many problems will be most effectively solved by following a
well developed problem determination methodology. This course will cover a basic
problem determination methodology.
Problem reporting
When you find yourself impacted by what you believe to be a product defect, you will
need to contact AIX Support. Before contacting AIX Support, you should write up a
description of the problem and the surrounding circumstances. When you open a new
Problem Management Report (PMR) with AIX Support, you will be expected to provide
them with a wealth of information to assist them in determining the cause of the
problem. The snap command is a common tool to assist in collecting a vast amount of
information about the environment surrounding the problem. The course materials will
cover these problem reporting procedures.
© Copyright IBM Corp. 2009, 2012 Unit 1. Advanced AIX administration overview 1-13
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 1. Advanced AIX administration overview 1-15
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Provide guidance on what needs to be known before things go wrong.
Details — This visual introduces one of the most important points that you as an instructor
should make regarding “what it takes” to be a successful system administrator: In order to
be successful at determining what has gone wrong (and how to respond) when there are
system problems, the administrator must be extremely familiar with the characteristics of
his or her system when it is functioning normally. Be sure to make this point!
Use the student notes to guide the rest of your presentation.
Additional information —
Transition statement — Some of the commands you might want to start out with are
discussed next.
Uempty
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 1. Advanced AIX administration overview 1-17
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Point out just a few of the commands that are helpful in learning about the
system and its configuration.
Details — Present the information in the student notes. Provide board work to show some
of the commands and the options that can be used with them.
Additional information — The bootinfo command is not officially supported; but, in case
students ask, here is some information regarding some of the most commonly used flags of
this command:
-r Displays real memory in KB
-p Displays hardware platform (rs6k, rspc, chrp)
-y Displays 32 if hardware is 32-bit or 64 if hardware is 64-bit
-K Displays 32 if kernel is 32-bit or 64 if kernel is 64-bit
-z Displays processor type (0=uniprocessor, 1=multiprocessor)
-s Displays size of disk (provided as an argument)
Transition statement — Let’s talk about what to do when things go wrong.
Uempty
1. Identify the
problem
2. Talk to users
to define the
problem
3. Collect
system data
4. Resolve the
problem
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 1. Advanced AIX administration overview 1-19
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Suggested questions
- What is the problem?
- What is the system doing (or not doing)?
- How did you first notice the problem?
- When did it happen?
- Have any changes been made recently?
Keep them talking until the picture is clear. Ask as many questions as you need to in
order to get the entire history of the problem.
SMIT logs
If SMIT has been used, there will be additional logs that could provide further
information. The SMIT log files are normally contained in the home directory of the root
user and is named smit.log, by default.
Information Center
The IBM Power Systems Information Center can be found at the following link:
http://publib.boulder.ibm.com/eserver
© Copyright IBM Corp. 2009, 2012 Unit 1. Advanced AIX administration overview 1-21
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Provide the big picture for problem resolution.
Details —
Additional information —
Transition statement — An important part of the problem description is the collection of
generated messages or codes. Let’s look at some of the different types of codes and where
we can look them up.
Uempty
Progress codes
Checkpoint during a process such as boot, shutdown, or dump
Obtained from:
Front panel of system enclosure
HMC or IVM (for logically partitioned systems)
Operator console message or diagnostics (diag utility)
Notes:
Introduction
AIX provides progress and error indicators (display codes) during the boot process.
These display codes can be very useful in resolving startup problems. Depending on
the hardware platform, the codes are displayed on the console and the operator panel.
Operator panel
For non-LPAR systems, the operator panel is an LED display on the front panel.
Beginning with the early POWER4 models, the POWER-based systems have had the
ability to be divided into multiple Logical Partitions (LPARs). In this case, a system-wide
LED display still exists on the front panel. However, the operator panel for each LPAR is
displayed on the screen of the Hardware Management Console (HMC). The HMC is a
separate system which is required when running multiple LPARs. Regardless of where
they are displayed, they are sometimes referred to as LED Display Codes.
© Copyright IBM Corp. 2009, 2012 Unit 1. Advanced AIX administration overview 1-23
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
© Copyright IBM Corp. 2009, 2012 Unit 1. Advanced AIX administration overview 1-25
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
http://publib.boulder.ibm.com/eserver
Select Hardware Information Center > Systems Hardware
information
Notes:
Documentation
Note: all information on Web sites and their design is based upon what is available at
the time of this course revision. Web site URLs and the design of the related Web
pages often change.
Reference codes and their meanings can be found at:
http://publib.boulder.ibm.com/eserver under the particular server with which you are
working (though most codes are the same, regardless of the server model).
© Copyright IBM Corp. 2009, 2012 Unit 1. Advanced AIX administration overview 1-27
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
If you believe that your problem is the result of a system defect, you can call AIX
Support to request assistance. Before you call 1-800-IBM-SERV, it is a good idea to
have certain information ready. They will want to verify your name against a list of
names associated with your customer number, and validate that your customer number
has support for the product in question. They will also need to know some details about
the hardware and software environment in which the problem is occurring - such as
your MTMS (machine type, model, serial), your AIX OS level, and the level of any other
relevant software. Of course, you need to explain your problem, providing as much
detail as possible, especially any error messages or codes.
The Level 1 Support personnel will ask you for the priority of your problem.
- Severity level 1(critical) indicates that the function does not work, your business is
severely impacted, there is no work around, and that there needs to be an
immediate solution. Be aware that, for severity level 1, you will be expected to be
available 24x7 until the problem is resolved.
Uempty - Severity level 2 (significant impact) indicates that the function is usable but is limited
in a way that your business is severely impacted.
- Severity level 3 (some impact) indicates that the program is usable with less
significant features (not critical to operations) unavailable.
- Severity level 4 (minimal impact) indicates that the problem causes little impact on
operations, or a reasonable circumvention to the problem has been implemented.
Level 1 Support will assign you a PMR number (actually a PMR and branch number
combination) for tracking purposes. In the future, each time you call about this problem,
you should have the PMR and branch numbers at hand.
Once the basic information has been collected, you are passed to Level 2 Support for
the product area for which you are having a problem. They will work with you in
investigating the nature and cause of your problem. They will search the support
database to see if it is a known problem that is either already being worked on or has a
solution already developed. In many cases, they will request that you update to a
specific technology level (TL) and service pack (SP) that already includes the fix.
If they do not have a fix, they may still ask you to update your system and determine if
the problem still exists. If the problem still exists, they now have a known software
environment to work with. At this point they will often ask for a complete set of
information from your system to be collected and uploaded to their server, to support
their investigation. The basic tool for collecting your system information is the snap
command.
© Copyright IBM Corp. 2009, 2012 Unit 1. Advanced AIX administration overview 1-29
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Introduce the procedure for working with AIX Support.
Details —
Additional information —
Transition statement — Let’s look at how we work with the snap command.
Uempty
# snap –a
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 1. Advanced AIX administration overview 1-31
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
© Copyright IBM Corp. 2009, 2012 Unit 1. Advanced AIX administration overview 1-33
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
# ftp testcase.software.ibm.com
User: anonymous
Password: <your email address>
ftp> cd /toibm/aix
ftp> bin
ftp> put PMR#.b<branch#>.c<country#>.snap.pax.Z
ftp> quit
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 1. Advanced AIX administration overview 1-35
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Interim fixes
On rare occasions, a customer has an urgent situation which needs fixes for a problem
so quickly that they cannot wait for the formal PTF to be released. In those situations, a
developer may place one or more individual file replacements on an FTP server and
allow the system administrator to download and install them. Originally, this would
simply involve manually copying the new files over the old files. But this created
problems, especially in identifying the state of a system which later experienced other
(possibly related) problems or in backing out the changes.
Today, there is a better methodology for managing these interim fixes using the efix
command. Security alerts will often provide interim fixes for the identified security
exposure. Depending upon your own risk analysis, you might immediately use the
interim fix, or wait for the next service pack (which will include these security fixes).
The syntax and use of the efix command was covered in the prerequisite course.
© Copyright IBM Corp. 2009, 2012 Unit 1. Advanced AIX administration overview 1-37
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Explain standard terminology for software updates
Details —
Additional information —
Transition statement — Let’s look at how we obtain these updates.
Uempty
Relevant documentation
IBM Power Systems
Notes:
IBM Redbooks
Redbooks can be viewed, downloaded, or ordered from the IBM Redbooks Web site:
http://www.redbooks.ibm.com
© Copyright IBM Corp. 2009, 2012 Unit 1. Advanced AIX administration overview 1-39
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Identify URLs for hardware and software documentation.
Details — Let students know that hard copy versions of the manuals can be ordered from
their IBM marketing representative.
Additional information —
Transition statement — Let’s review what we have covered with some checkpoint
questions.
Uempty
Checkpoint
IBM Power Systems
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 1. Advanced AIX administration overview 1-41
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Discuss the first group of checkpoint questions.
Details — A checkpoint solution is provided below:
Checkpoint solutions
IBM Power Systems
Additional information —
Transition statement — Let’s take a look at what we have in the class lab environment.
Uempty
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 1. Advanced AIX administration overview 1-43
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Introduce the exercise for this unit.
Details —
Additional information —
Transition statement — Let’s summarize what we have covered in this unit.
Uempty
Unit summary
IBM Power Systems
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 1. Advanced AIX administration overview 1-45
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Remind the students of some of the key points in this unit.
Details — Before continuing to the next unit stop and ask the students if there are any
additional questions before continuing.
Additional information —
Transition statement — That is the end of this unit.
Estimated time
01:10
References
Online AIX Version 7.1 Command Reference volumes 1-6
Online AIX Version 7.1 General Programming Concepts:
Writing and Debugging Programs
Online AIX Version 7.1 Technical Reference: Kernel and
Subsystems
Note: References listed as “online” above are available through the
IBM Systems Information Center at the following address:
http://publib.boulder.ibm.com/eserver
© Copyright IBM Corp. 2009, 2012 Unit 2. The Object Data Manager 2-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Unit objectives
IBM Power Systems
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 2. The Object Data Manager 2-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
© Copyright IBM Corp. 2009, 2012 Unit 2. The Object Data Manager 2-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 2. The Object Data Manager 2-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Devices Software
System
SMIT menus
resource ODM
and panels
controller
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 2. The Object Data Manager 2-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Provide an overview of what data is stored in the ODM.
Details — Go quickly through the list and mention that the main emphasis in this unit is on
devices and software vital product data. You might want to point out that the two “hands” on
the visual “point” to the types of data that will be emphasized. Later on, you supply the
corresponding ODM database files where the data is stored.
Additional information — You might mention that TCP/IP configuration can still be set up
without using ODM. In this case, traditional ASCII files are used for storing TCP/IP data. To
determine whether ODM is used for TCP/IP, use the following command:
# lsattr -El inet0
If the attribute bootup_option is set to no, ODM files are used. If it is set to yes, ODM will
not be used.
Transition statement — Let’s define some key terminology we will need for our discussion
of the ODM.
Uempty
ODM components
IBM Power Systems
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 2. The Object Data Manager 2-11
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Define the basic components of ODM.
Details — Complete the visual during the lesson. ODM components are:
• Object classes
The ODM consists of many database files, where each file is called an object class.
• Objects
Each object class consists of objects. Each object is one record in an object class.
• Descriptors
The descriptors describe the layout of the objects. They determine the name and
datatype of the fields that are part of the object class.
Additional information — This visual shows an extraction out of the ODM class PdAt. Do
not explain the meaning of PdAt or the different fields on this page. Concentrate on the
components of the ODM.
Transition statement — It is also important to understand how the terms predefined
device information and customized device information are used when discussing the ODM.
Uempty
Notes:
Current focus
In this unit, we will concentrate on ODM classes that are used to store device
information and software product data. At this point, we will narrow our focus even
further and confine our discussion to ODM classes that store device information.
© Copyright IBM Corp. 2009, 2012 Unit 2. The Object Data Manager 2-13
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
© Copyright IBM Corp. 2009, 2012 Unit 2. The Object Data Manager 2-15
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Predefined databases
PdDv
PdCn PdAt
Configuration Manager
Config_Rules
(cfgmgr)
Customized databases
CuDvDr CuVPD
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 2. The Object Data Manager 2-17
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Configuration manager
IBM Power Systems
PdAt
PdCn
Config_Rules
cfgmgr
Customized Methods
CuDv Define
Device Load
CuAt Configure
Driver
CuDep Change
Unload
CuDvDr Unconfigure
CuVPD Undefine
© Copyright IBM Corporation 2012
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 2. The Object Data Manager 2-19
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
CuDv
Constant for machines of same architecture
CuAt
CuDep Constant for all machines
CuDvDr
CuVPD
Config_Rules PdDv
PdAt
history PdCn
inventory
lpp history
product inventory
lpp history
nim_* product inventory
SWservAt lpp
SRC* sm_* product
Notes:
Introduction
Originally, the three parts of the ODM were designed to support diskless, dataless and
other workstations. The ODM object classes are held in three repositories. Each of
these repositories is described in the material that follows.
/etc/objrepos
The purpose of this location is to hold information that is expected to vary from machine
to machine and can not be shared with other machines. It contains the part of the
product that cannot be shared among machines. Each client must have its own copy.
Most of this software requiring a separate copy for each machine is associated with the
configuration of the machine or product.
One example is the customized device information. For example, the location of a
device or the overrides to the default attributes can be expected to vary.
Uempty This repository contains the customized devices object classes and the four object
classes used by the Software Vital Product Database (SWVPD) for the / (root) part of
the installable software product. The root part of the software contains files that must
be installed on the target system. For example, any configuration files used by the
programs would be in the root part.
To access information in the other directories, this directory contains symbolic links to
the predefined devices object classes. The links are needed because the ODMDIR
variable points to only /etc/objrepos.
/usr/lib/objrepos
This repository contains the predefined devices object classes, SMIT menu object
classes, and the four object classes used by the SWVPD for the /usr part of the
installable software product. The object classes in this repository can be shared across
the network by /usr clients, dataless and diskless workstations. Software installed in the
/usr part can be can be shared among several machines with compatible hardware
architectures.
/usr/share/lib/objrepos
Contains the four object classes used by the SWVPD for the /usr/share part of the
installable software product. The /usr/share part of a software product contains files
that are not hardware dependent. They can be shared among several machines, even if
the machines have a different hardware architecture. An example of this are terminfo
files that describe terminal capabilities. As terminfo is used on many UNIX systems,
terminfo files are part of the /usr/share part of a system product.
lslpp options
The lslpp command can list the software recorded in the ODM. When run with the -l
(lower case L) flag, it lists each of the locations (/, /usr/lib, /usr/share/lib) where it finds
the fileset recorded. This can be distracting if you are not concerned with these
distinctions. Alternately, you can run lslpp -L which only reports each fileset once,
without making distinctions between the root, usr, and share portions.
terminfo files are part of the /usr/share part of a system product.
© Copyright IBM Corp. 2009, 2012 Unit 2. The Object Data Manager 2-21
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
root portion of the software in its private file systems. The the other object repositories
for the software would be maintained in global environment file systems, which would
be shared among all WPARs, using read-only mounts. For details, attend the course
that teaches AIX Workload Partitions.
© Copyright IBM Corp. 2009, 2012 Unit 2. The Object Data Manager 2-23
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
# cfgmgr
PdDv: CuDv:
type = "14106902" name = "ent1"
class = "adapter" status = 1
subclass = "pci" chgstatus = 2
prefix = "ent" ddins = "pci/goentdd"
... location = "02-08"
DvDr = "pci/goentdd" parent = "pci2"
Define = /usr/lib/methods/define_rspc" connwhere = "8“
Configure = "/usr/lib/methods/cfggoent"
... PdDvLn = "adapter/pci/14106902"
uniquetype = "adapter/pci/14106902"
PdAt: CuAt:
uniquetype = "adapter/pci/14106902" name = "ent1"
attribute = "jumbo_frames" attribute = "jumbo_frames"
deflt = "no" value = "yes"
values = "yes,no" type = "R"
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 2. The Object Data Manager 2-25
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
File system
information ?
User/security
information ?
Queues and
queue devices ?
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 2. The Object Data Manager 2-27
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
1.
_______
2. 3.
AIX kernel Applications
Figure 2-11. Let’s review: Device configuration and the ODM AN152.2
Notes:
Instructions
Please answer the following questions by writing them on the picture above. If you are
unsure about a question, leave it out.
1. Which command configures devices in an AIX system? Note: This is not an ODM
command.
2. Which ODM class contains all devices that your system supports?
3. Which ODM class contains all devices that are configured in your system?
4. Which programs are loaded into the AIX kernel to control access to the devices?
5. If you have a configured tape drive rmt1, which special file do applications access to
work with this device?
© Copyright IBM Corp. 2009, 2012 Unit 2. The Object Data Manager 2-29
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
ODM commands
IBM Power Systems
Descriptors: odmshow
Notes:
Introduction
Different commands are available for working with each of the ODM components:
object classes, descriptors, and objects.
© Copyright IBM Corp. 2009, 2012 Unit 2. The Object Data Manager 2-31
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Introduce the ODM command line interface.
Details — Explain briefly the different ODM commands. Introduce the ODMDIR variable that
is used for all ODM commands.
Additional information — Tell the students that for system developers, an ODM API is
available.
Transition statement — The commands for working with ODM objects are the commands
system administrators use most often, so let’s spend a little more time talking about how
these commands work.
Uempty
# vi file
PdAt:
uniquetype = "tape/scsi/scsd"
attribute = "block_size"
deflt = “512" Modify deflt to 512
values = "0-2147483648,1"
width = ""
type = "R"
generic = "DU"
rep = "nr"
nls_index = 6
# odmadd file
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 2. The Object Data Manager 2-33
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Possible queries
As with any database, you can perform queries for records matching certain criteria.
The tests are on the values of the descriptors of the objects. A number of tests can be
performed:
= Equal
!= Not equal
> Greater
>= Greater than or equal to
< Less than
<= Less than or equal to
like Similar to; finds patterns in character string data
For example, to search for records where the value of the lpp_name attribute begins
with bosext1., you would use the syntax lpp_name like bosext1.*
Tests can be linked together using normal boolean operations, as shown in the
following example:
uniquetype=tape/scsi/scsd and attribute=block_size
In addition to the * wildcard, a ? can be used as a wildcard character.
© Copyright IBM Corp. 2009, 2012 Unit 2. The Object Data Manager 2-35
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
# vi file
PdAt:
uniquetype = "tape/scsi/scsd"
attribute = "block_size"
deflt = "512" Modify deflt to 512
values = "0-2147483648,1"
width = ""
type = "R"
generic = "DU"
rep = "nr"
nls_index = 6
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 2. The Object Data Manager 2-37
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
© Copyright IBM Corp. 2009, 2012 Unit 2. The Object Data Manager 2-39
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
product:
product:
lpp_name
lpp_name == "bos.rte.printers"
"bos.rte.printers" inventory:
comp_id inventory:
comp_id == "5765-G6200"
"5765-G6200" lpp_id
lpp_id == 38
38
update
update == 00 private
cp_flag private == 00
cp_flag == 2359571
2359571 file_type
file_type == 00
fesn
fesn == "0000"
"0000" format
name = "bos" format == 11
name = "bos" loc0
loc0 == "/etc/qconfig"
"/etc/qconfig"
state
state == 55 loc1
ver loc1 == ""
""
ver == 77 loc2 = ""
loc2 = ""
rel
rel == 11 size = 0
mod size = 0
mod == 00 checksum
checksum == 00
fix
fix == 00 ...
ptf ...
ptf == ""
""
media
media == 00
sceded_by
sceded_by == """"
fixinfo
fixinfo == """"
prereq
prereq = "*coreq bos.rte
= "*coreq bos.rte 7.1.0.0"
7.1.0.0" history:
history:
description
description == """" lpp_id
lpp_id == 3838
supersedes
supersedes == """" event
event == 22
ver
ver == 77
lpp:
lpp: rel
rel == 11
name
name == "bos.rte.printers"
"bos.rte.printers" mod
mod == 00
size
size == 00 fix
fix == 00
state
state = 55
= ptf
ptf == ""
""
cp_flag
cp_flag == 2359571
2359571 corr_svn
corr_svn == """"
group
group == ""
"" cp_mod
cp_mod == """"
magic_letter
magic_letter == "I"
"I" cp_fix =
cp_fix = """"
ver
ver == 77 login_name
login_name == "root"
"root"
rel
rel == 11 state
state == 11
mod
mod == 00 time
time == 1310159341
1310159341
fix
fix == 00 comment
comment == """"
description
description == "Front
"Front End
End Printer
Printer Support"
Support"
lpp_id
lpp_id == 38
38
Notes:
SWVPD classes
The Software Vital Product Data is stored in the following ODM classes:
lpp The lpp object class contains information about the installed
software products, including the current software product state
and description.
inventory The inventory object class contains information about the files
associated with a software product.
product The product object class contains product information about
the installation and updates of software products and their
prerequisites.
history The history object class contains historical information about
the installation and updates of software products.
© Copyright IBM Corp. 2009, 2012 Unit 2. The Object Data Manager 2-41
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Introduce the software vital product database.
Details — Explain what kind of data is stored in the ODM classes (version, release, and so
forth) and the meaning of the shown ODM classes. Identify how the classes are linked
together by the lpp_id descriptor. Note that the list of descriptors is not complete and that
the slide only lists selected descriptors for teaching purposes.
Additional information — At this point, you might introduce the lslpp command, which
has options like -l, -h, -f and -w. This command queries the software vital product
database. We can see most of this information with the high-level lslpp command. The
flags (and the related object classes) are:
-L : Lists the filesets (lpp object class)
-d : Lists the fileset dependencies (product object class)
-p : Lists the fileset prerequisites (product object class)
-w : Lists the fileset for a given file (inventory object class)
-f : Lists the files for a given fileset (inventory object class)
-h : Lists the maintenance history for a fileset (history object class)
The commands used to produce the output on the visual are:
• lpp:
# odmget -q name=bos.rte.printers lpp
• product:
# odmget -q lpp_name=bos.rte.printers product
• inventory:
# odmget -q lpp_id=38 inventory | pg
Since there are a number of files in the root file system for this fileset, there are a
number of objects that match this query (hence the pg command). Note that there are
also files in this fileset in the /usr file system.
To display these: ODMDIR=/usr/lib/objrepos, then rerun the last odmget command.
(Note: ODMDIR defaults to /etc/objrepos.)
• history:
# odmget -q lpp_id=38 history
Transition statement — Let’s introduce the most important software states.
Uempty
Applying
Committing If installation was not successful:
a) installp -C
Rejecting b) smit maintain_software
Deinstalling
Cleanup failed
Broken
Remove software and reinstall
Notes:
Introduction
The AIX software vital product database uses software states that describe the status of
an install or update package.
© Copyright IBM Corp. 2009, 2012 Unit 2. The Object Data Manager 2-43
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Once a product is committed, if you would like to return to the old version, you must
remove the current version and reinstall the old version.
© Copyright IBM Corp. 2009, 2012 Unit 2. The Object Data Manager 2-45
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Predefined devices
IBM Power Systems
PdDv:
type = "scsd"
class = "tape"
subclass = "scsi"
prefix = "rmt"
...
base = 0
...
detectable = 1
...
led = 2418
setno = 54
msgno = 0
catalog = "devices.cat"
DvDr = "tape"
Define = "/etc/methods/define"
Configure = "/etc/methods/cfgsctape"
Change = "/etc/methods/chggen"
Unconfigure = "/etc/methods/ucfgdevice"
Undefine = "etc/methods/undefine"
Start = ""
Stop = ""
...
uniquetype = "tape/scsi/scsd"
Notes:
type
This specifies the product name or model number, for example, 8 mm (tape).
class
Specifies the functional class name. A functional class is a group of device instances
sharing the same high-level function. For example, tape is a functional class name
representing all tape devices.
Uempty subclass
Device classes are grouped into subclasses. The subclass scsi specifies all tape
devices that may be attached to a SCSI interface.
prefix
This specifies the Assigned Prefix in the customized database, which is used to derive
the device instance name and /dev name. For example, rmt is the prefix name
assigned to tape devices. Names of tape devices would then look like rmt0, rmt1, or
rmt2.
base
This descriptor specifies whether a device is a base device or not. A base device is any
device that forms part of a minimal base system. During system boot, a minimal base
system is configured to permit access to the root volume group (rootvg) and hence to
the root file system. This minimal base system can include, for example, the standard
I/O diskette adapter and a SCSI hard drive. The device shown on the visual is not a
base device.
This flag is also used by the bosboot and savebase commands, which are introduced
later in this course.
detectable
This specifies whether the device instance is detectable or undetectable. A device
whose presence and type can be determined by the cfgmgr, once it is actually powered
on and attached to the system, is said to be detectable. A value of 1 means that the
device is detectable, and a value of 0 that it is not (for example, a printer or tty).
led
This indicates the value displayed on the LEDs when the configure method begins to
run. The value stored is decimal, but the value shown on the LEDs is hexadecimal
(2418 is 972 in hex).
setno, msgno
Each device has a specific description (for example, SCSI Tape Drive) that is shown
when the device attributes are listed by the lsdev command. These two descriptors are
used to look up the description in a message catalog.
© Copyright IBM Corp. 2009, 2012 Unit 2. The Object Data Manager 2-47
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
catalog
This identifies the filename of the national language support (NLS) catalog. The LANG
variable on a system controls which catalog file is used to show a message. For
example, if LANG is set to en_US, the catalog file /usr/lib/nls/msg/en_US/devices.cat is
used. If LANG is de_DE, catalog /usr/lib/nls/msg/de_DE/devices.cat is used.
DvDr
This identifies the name of the device driver associated with the device (for example,
tape). Usually, device drivers are stored in directory /usr/lib/drivers. Device drivers are
loaded into the AIX kernel when a device is made available.
Define
This names the define method associated with the device type. This program is called
when a device is brought into the defined state.
Configure
This names the configure method associated with the device type. This program is
called when a device is brought into the available state.
Change
This names the change method associated with the device type. This program is called
when a device attribute is changed through the chdev command.
Unconfigure
This names the unconfigure method associated with the device type. This program is
called when a device is unconfigured by rmdev -l.
Undefine
This names the undefine method associated with the device type. This program is
called when a device is undefined by rmdev -l -d.
Start, stop
Few devices support a stopped state (only logical devices). A stopped state means that
the device driver is loaded, but no application can access the device. These two
attributes name the methods to start or stop a device.
Uempty uniquetype
This is a key that is referenced by other object classes. Objects use this descriptor as a
pointer back to the device description in PdDv. The key is a concatenation of the class,
subclass, and type values.
© Copyright IBM Corp. 2009, 2012 Unit 2. The Object Data Manager 2-49
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Introduce object class PdDv.
Details — Explain the different descriptors.
Additional information — If you want, you can mention there is an additional method for
starting and stopping a device. To stop a device issue the following command:
# rmdev -l <device_name> -S
Be happy if you found a device that supports the stopped state. Remember physical
devices do not support a stopped state.
You can list the devices in the Predefined Devices object class using the following
command:
# lsdev -P
Transition statement — Next class is PdAt.
Uempty
Predefined attributes
IBM Power Systems
PdAt:
uniquetype = "tape/scsi/scsd"
attribute = "block_size"
deflt = ""
values = "0-2147483648,1"
...
PdAt:
uniquetype = "disk/scsi/osdisk"
attribute = "pvid"
deflt = "none"
values = ""
...
PdAt:
uniquetype = "tty/rs232/tty"
attribute = "term"
deflt = "dumb"
values = ""
...
Notes:
uniquetype
This descriptor is used as a pointer back to the device defined in the PdDv object class.
© Copyright IBM Corp. 2009, 2012 Unit 2. The Object Data Manager 2-51
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
attribute
This identifies the name of the attribute. This is the name that can be passed to the
mkdev or chdev command. For example, to change the default name of dumb to ibm3151
for tty0, you can issue the following command:
# chdev -l tty0 -a term=ibm3151
deflt
This identifies the default value for an attribute. Nondefault values are stored in CuAt.
values
This identifies the possible values that can be associated with the attribute name. For
example, allowed values for the block_size attribute range from 0 to 2147483648, with
an increment of 1.
© Copyright IBM Corp. 2009, 2012 Unit 2. The Object Data Manager 2-53
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Customized devices
IBM Power Systems
CuDv:
name = "ent1"
status = 1
chgstatus = 2
ddins = "pci/goentdd"
location = "02-08"
parent = "pci2"
connwhere = "8"
PdDvLn = "adapter/pci/14106902"
CuDv:
name = "hdisk2"
status = 1
chgstatus = 2
ddins = "scdisk"
location = "01-08-01-8,0"
parent = "scsi1"
connwhere = "8,0"
PdDvLn = "disk/scsi/scsd"
Notes:
Uempty name
A customized device object for a device instance is assigned a unique logical name to
distinguish the device from other devices. The visual shows two devices, an Ethernet
adapter ent1 and a disk drive hdisk2.
status
This identifies the current status of the device instance. Possible values are:
- status = 0 - Defined
- status = 1 - Available
- status = 2 - Stopped
chgstatus
This flag tells whether the device instance has been altered since the last system boot.
The diagnostics facility uses this flag to validate system configuration. The flag can take
these values:
- chgstatus = 0 - New device
- chgstatus = 1 - Doesn’t know
- chgstatus = 2 - Same
- chgstatus = 3 - Device is missing
ddins
This descriptor typically contains the same value as the Device Driver Name descriptor
in the Predefined Devices (PdDv) object class. It specifies the name of the device
driver that is loaded into the AIX kernel.
location
Identifies the AIX location of a device. The location code is a path from the system unit
through the adapter to the device. In case of a hardware problem, the location code is
used by technical support to identify a failing device.
parent
Identifies the logical name of the parent device. For example, the parent device of
hdisk2 is scsi1.
© Copyright IBM Corp. 2009, 2012 Unit 2. The Object Data Manager 2-55
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
connwhere
Identifies the specific location on the parent device where the device is connected. For
example, the device hdisk2 uses the SCSI address 8,0.
PdDvLn
Provides a link to the device instance's predefined information through the uniquetype
descriptor in the PdDv object class.
© Copyright IBM Corp. 2009, 2012 Unit 2. The Object Data Manager 2-57
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Customized attributes
IBM Power Systems
CuAt:
name = "ent1"
attribute = "jumbo_frames"
value = "yes"
...
CuAt:
name = "hdisk2"
attribute = "pvid"
value = "00c35ba0816eafe50000000000000000"
...
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 2. The Object Data Manager 2-59
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
PdCn: CuDvDr:
uniquetype = "adapter/pci/sym875" resource = "devno"
connkey = "scsi" value1 = "36"
connwhere = "1,0" value2 = "0"
value3 = "hdisk3"
PdCn:
uniquetype = "adapter/pci/sym875" CuDvDr:
connkey = "scsi" resource = "devno"
connwhere = "2,0" value1 = "36"
value2 = "1"
value3 = "hdisk2"
CuDep: CuVPD:
name = "rootvg" name = "hdisk2"
dependency = "hd6" vpd_type = 0
vpd = "*MFIBM *TM\n\
CuDep: HUS151473VL3800 *F03N5280
name = "datavg" *RL53343341*SN009DAFDF*ECH17923D
dependency = "lv01" *P26K5531 *Z0\n\
000004029F00013A*ZVMPSS43A
*Z20068*Z307220"
Notes:
PdCn
The Predefined Connection (PdCn) object class contains connection information for
adapters (or sometimes called intermediate devices). This object class also includes
predefined dependency information. For each connection location, there are one or
more objects describing the subclasses of devices that can be connected.
The sample PdCn objects on the visual indicate that, at the given locations, all devices
belonging to subclass SCSI could be attached.
CuDep
The Customized Dependency (CuDep) object class describes device instances that
depend on other device instances. This object class describes the dependence links
between logical devices and physical devices as well as dependence links between
Uempty logical devices, exclusively. Physical dependencies of one device on another device are
recorded in the Customized Devices (CuDep) object class.
The sample CuDep objects on the visual show the dependencies between logical
volumes and the volume groups they belong to.
CuDvDr
The Customized Device Driver (CuDvDr) object class is used to create the entries in
the /dev directory. These special files are used from applications to access a device
driver that is part of the AIX kernel. The attribute value1 is called the major number and
is a unique key for a device driver. The attribute value2 specifies a certain operating
mode of a device driver.
The sample CuDvDr objects on the visual reflect the device driver for disk drives
hdisk2 and hdisk3. The major number 36 specifies the driver in the kernel. In our
example, the minor numbers 0 and 1 specify two different instances of disk dives, both
using the same device driver. For other devices, the minor number may represent
different modes in which the device can be used. For example, if we were looking at a
tape drive, the operating mode 0 would specify a rewind on close for the tape drive, the
operating mode 1 would specify no rewind on close for a tape drive.
CuVPD
The Customized Vital Product Data (CuVPD) object class contains vital product data
(manufacturer of device, engineering level, part number, and so forth) that is useful for
technical support. When an error occurs with a specific device, the vital product data is
shown in the error log.
© Copyright IBM Corp. 2009, 2012 Unit 2. The Object Data Manager 2-61
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Explain briefly the function of some additional ODM classes.
Details — Describe the ODM classes shown using the explanations in the student notes.
Avoid going into too much detail; these are mostly used, under the covers, by the operating
system. Try to summarize the object classes in simple to understand terms. For example:
• PdCn - Identifies the family of possible connections. If a SCSI adapter only supported 8
possible SCSI addresses, there would be 8 PdCn objects, one for each possible
address. Remember that only a few of these address might actually be in use.
• CuDep - Identifies dependencies. In the example shown, the system would use this
dependency to prevent removal of the datavg volume group after until the logical
volume lv01 was removed.
• CuDvDr - Identifies device driver for each device using major and minor numbers. This
is the same information you see when you execute a long listing of the files in the /dev
directory.
• CuVPD - Vital product information. Basically manufacturer and product information.
Additional information — None
Transition statement — Let us look at how these device object classes relate to the high
level commands that we will more often use to examine and change this information.
Uempty
Notes:
Most of the time the information in the ODM device database is accessed and managed
using high-level commands. Understanding the object classes and their roles assists in
the using these commands.
The lsdev command has options which control which ODM object class you list.
To see the objects in the Predefined Device (PdDv) object class, use the -P flag. If you
want to control the output, you can optionally qualify the command with any
combination of the three key descriptors: class, subclass, and type.
To see objects in the Customized Device (CuDv) object class, use the -C flag. To
control the output, you can either specify a particular device (using its logical device
name) or you can use any combination of the PdDv object class key descriptors.
Here is an example of specifying a particular device:
# lsdev -l hdisk0
© Copyright IBM Corp. 2009, 2012 Unit 2. The Object Data Manager 2-63
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
The most common PdDv descriptor qualification is the class. Thus, it is common to
enter commands such as:
# lsdev -Cc disk
# lsdev -Cc adapter
The lsattr command, also, has options which control which ODM object classes it
uses.
To see the default attribute values, which are stored in the Predefined Attributes (PdAt)
object class, use the -D flag. You must uniquely identify the object by either:
• Specifying the class, subclass, and type for the object
• Specifying the logical device name of a customized device which is related to the
PdAt object
The effective attributes are either the attributes in the Customized Attributes (CuAt)
object class for the specified device, or (if the there is no value specified in the CuAt)
the default attribute value from the related PdAt object. You must specify a particular
device by providing the logical device name of that device.
When using the chdev command to modify an attribute value, the command logic will
not allow you to enter what it considers unacceptable values. It knows what is allowed
by examining the value descriptor for the attribute in the PdAt object class. If you get an
exception message attempting to set an attribute value, it is useful to know what is
acceptable. This information is displayed by the lsattr command when using the -R
(range) flag. The -R option requires that the attribute name be identified in addition to
the logical name of the device for which you are attempting modify that attribute.
© Copyright IBM Corp. 2009, 2012 Unit 2. The Object Data Manager 2-65
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Checkpoint
IBM Power Systems
1. In which ODM class do you find the physical volume IDs of your
disks?
Notes:
Checkpoint solutions
IBM Power Systems
Additional information —
Transition statement — Let’s look at reinforcing what we have covered by playing with the
ODM in the lab.
© Copyright IBM Corp. 2009, 2012 Unit 2. The Object Data Manager 2-67
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 2. The Object Data Manager 2-69
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Unit summary
IBM Power Systems
Notes:
The ODM is made from object classes, which are broken into individual objects and
descriptors.
AIX offers a command line interface to work with the ODM files.
The device information is held in the customized and the predefined databases
(Cu*, Pd*).
© Copyright IBM Corp. 2009, 2012 Unit 2. The Object Data Manager 2-71
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Estimated time
01:10
References
Online AIX Version 7.1 General Programming Concepts:
Writing and Debugging Programs (Chapter 5.
Error-Logging Overview)
Online AIX Version 7.1 Command Reference volumes 1-6
Note: References listed as “online” above are available through the
IBM Systems Information Center at the following address:
http://publib.boulder.ibm.com/eserver
Unit objectives
IBM Power Systems
Notes:
For an ILO (Instructor Lead On-line) class: You should play file AN152U03F02 in the
multimedia library of Elluminate in place of the next visual. You can then continue your
lecture normally to reinforce the topics if desired.
a. To access the multimedia library click on the CD button along the toolbar in
Elluminate.
b. Once the multimedia library window is open, select AN152U03F02 and click on play.
c. Ask the students to indicate via green checkmark when they have finished the file.
For a FTF (Face to Face) class: You should play file AN152U03F02 on your instructor PC
in place of the next visual.
Note that you can also use this activity as a review for the information covered in the
visual as well.
smit
diagnostics
e-mail
console formatted
errpt
output
error notify
method
ODM
errlog
errnotify /var/adm/ras/errlog
error
daemon
errclear
errstop /usr/lib/errdemon
errlogger
application
errlog() User
Kernel
/dev/error
errsave() (timestamp)
kernel module
© Copyright IBM Corporation 2012
Notes:
Detection of an error
The error logging process begins when an operating system module detects an error.
The error detecting segment of code then sends error information to either the
errsave() kernel service or the errlog() application subroutine, where the information
is in turn written to the /dev/error special file. This process then adds a timestamp to
the collected data. The errdemon daemon constantly checks the /dev/error file for new
entries, and when new data is written, the daemon conducts a series of operations.
Uempty When you access the error log with the errpt command (from the command line or by
way of a SMIT panel), the error log is formatted according to the error template in the
error record template and presented in either a summary or detailed report. Most
entries in the error log are attributable to hardware and software problems, but
informational messages can also be logged, for example, by the system administrator,
using the errlogger command.
Instructor notes:
Purpose — Define the components of the error logging facility.
Details — Cover the diagram on the visual starting from the bottom, with the error being
detected by errlog() or errsave() and an entry being made in /dev/error, up to the point
where a user can look at the records of the error log either by going through SMIT or by
executing the errpt command. An optional flow is shown in the upper left of the visual.
Briefly mention that a given error record could be defined to trigger an automatic action
(called a method). This error notification mechanism will be cover in more detail later in the
unit.
Additional information — .
The following is a list of terms that you may refer to:
error ID This is a 32-bit hexadecimal code used to identify a particular
failure. Each error record template has a unique error ID.
error label This is the mnemonic name for an error ID.
error log This is the file that stores instances of errors and failures
encountered by the system.
error log entry A record in the system error log that describes a failure.
Contains captured failure data.
error record template A description of what will be displayed when the error log is
formatted for a report, including information on the type and
class of error, probable causes and recommended actions.
Collectively, the templates comprise the Error Record Template
Repository.
The errpt command can be run from the shell or SMIT to format records in the errlog into
readable reports. The ODM classes CuDv, CuAt and CuVPD provides information for the
detailed error reporting.
Uempty
# smit errpt
Generate an Error Report
...
CONCURRENT error reporting? no
Type of Report summary +
Error CLASSES (default is all) [] +
Error TYPES (default is all) [] +
Error LABELS (default is all) [] +
Error ID's (default is all) [] +
Resource CLASSES (default is all) []
Resource TYPES (default is all) []
Resource NAMES (default is all) []
SEQUENCE numbers (default is all) []
STARTING time interval []
ENDING time interval []
Show only Duplicated Errors [no]
Consolidate Duplicated Errors [no]
LOGFILE [/var/adm/ras/errlog]
TEMPLATE file [/var/adm/ras/errtmplt]
MESSAGE file []
FILENAME to send report to (default is stdout)[]
...
© Copyright IBM Corporation 2012
Notes:
Overview
The SMIT fastpath smit errpt takes you to the screen used to generate an error
report. Any user can use this screen. As shown on the visual, the screen includes a
number of fields that can be used for report specifications. Some of these fields are
described in more detail below.
Type of report
Summary, intermediate, and detailed reports are available. Detailed reports give
comprehensive information. Intermediate reports display most of the error information.
Summary reports contain concise descriptions of errors.
Error classes
Values are H (hardware), S (software), and O (operator messages created with
errlogger). You can specify more than one error class.
Error types
Valid error types include the following:
- PEND - The loss of availability of a device or component is imminent.
- PERF - The performance of the device or component has degraded to below an
acceptable level.
- TEMP - Recovered from condition after several attempts.
- PERM - Unable to recover from error condition. Error types with this value are usually
the most severe errors and imply that you have a hardware or software defect. Error
types other than PERM usually do not indicate a defect, but they are recorded so that
they can be analyzed by the diagnostic programs.
- UNKN - Severity of the error cannot be determined.
- INFO - The error type is used to record informational entries
Error labels
An error label is the mnemonic name used for an error ID.
Error IDs
An error ID is a 32-bit hexadecimal code used to identify a particular failure.
Resource classes
Means device class for hardware errors (for example, disk).
Resource types
Indicates device type for hardware (for example, 355 MB).
Instructor notes:
Purpose — Explain how an error report can be generated through SMIT.
Details — Explain the options in generating an error report. The main option is the type of
report: summary versus detailed versus intermediate. Also explain the concurrent option.
Point out that the rest of the fields are for identifying criteria for selectively reporting. Ask
the student if they have previously worked with the AIX error log and (if they have) what
their experiences are regarding these options.
Additional information — This option will allow you to produce a detailed or summary
report. Examples of both will be given.
Mention all the different fields that can be used to generate specific searches and reports.
Note that the report can be sent to a file - which is defined by the last option.
The Show only Duplicated Errors option in the Generate an Error Report screen was
introduced in AIX 5L V5.1. Examples of duplicate errors might include floppy drive not
ready, external drive not ready, or Ethernet card unplugged.
Transition statement — Instead of using SMIT, you can also generate a report from the
command line. Let's see how this can be done.
Uempty
Summary report:
# errpt
Intermediate report:
# errpt -A
Detailed report:
# errpt -a
Summary report of all hardware errors:
# errpt -d H
Detailed report of all software errors:
# errpt -a -d S
Concurrent error logging ("Real-time" error logging):
# errpt -c > /dev/console
Notes:
The -d option
The -d option (flag) can be used to limit the report to a particular class of errors. Two
examples illustrating use of this flag are shown on the visual:
- The command errpt -d H specifies a summary report of all hardware (-d H) errors.
- The command errpt -a -d S specifies a detailed report (-a) of all software (-d S)
errors.
The -c option
If you want to display the error entries concurrently, that is, at the time they are logged,
you must execute errpt -c. In the example on the visual, we direct the output to the
system console.
The -D flag
Duplicate errors can be consolidated using errpt -D. When used with the -a option,
errpt -D reports only the number of duplicate errors and the timestamp for the first and
last occurrence of the identical error.
The -P flag
Shows only errors which are duplicates of the previous error. The -P flag applies only to
duplicate errors generated by the error log device driver.
Additional information
The errpt command has many options. Refer to your AIX Commands Reference (or
the man page for errpt) for a complete description.
# errpt
Notes:
LABEL: LVM_SA_PVMISS
IDENTIFIER: F7DDA124
Description
PHYSICAL VOLUME DECLARED MISSING
Probable Causes
POWER, DRIVE, ADAPTER, OR CABLE FAILURE
Detail Data
MAJOR/MINOR DEVICE NUMBER
8000 0011 0000 0001
SENSE DATA
00C3 5BA0 0000 4C00 0000 0115 7F54 BF78 00C3 5BA0 7FCF 6B93 0000 0000 0000 0000
© Copyright IBM Corporation 2012
Notes:
Description
PHYSICAL VOLUME DECLARED MISSING
Probable Causes
POWER, DRIVE, ADAPTER, OR CABLE FAILURE
Detail Data
MAJOR/MINOR DEVICE NUMBER
8000 0011 0000 0001
SENSE DATA
00C3 5BA0 0000 4C00 0000 0115 7F54 BF78 00C3 5BA0 7FCF 6B93 0000 0000
0000 0000
Error
Error Label Recommendations
Type
DISK_ERR1 P Failure of physical volume media
Action: Replace device as soon as possible
DISK_ERR2, P Device does not respond
DISK_ERR3 Action: Check power supply
DISK_ERR4 T Error caused by bad block or occurrence of a
recovered error
Rule of thumb: If disk produces more than one
DISK_ERR4 per week, replace the disk
SCSI_ERR* P SCSI communication problem
(SCSI_ERR10) Action: Check cable, SCSI addresses,
terminator
Error types: P = Permanent
T = Temporary
© Copyright IBM Corporation 2012
Notes:
Uempty - Sometimes SCSI errors are logged, mostly with the LABEL SCSI_ERR10. They
indicate that the SCSI controller is not able to communicate with an attached device.
In this case, check the cable (and the cable length), the SCSI addresses, and the
terminator.
DISK_ERR5 errors
A very infrequent error is DISK_ERR5. It is the catch-all (that is, the problem does not
match any of the above DISK_ERRx symptoms). You need to investigate further by
running the diagnostic programs which can detect and produce more information about
the problem.
Instructor notes:
Purpose — Define the different types of disk errors.
Details — Explain using the information in the student notes.
Additional information — Explain each type of error in turn:
Disk errors 1,2, and 4 will return sense data which can be analyzed by the diagnostic
programs to provide extra information regarding the nature of the error, and its severity.
DISK_ERR 4 is by far the most common error generated, and it is the least severe. It
indicates that a bad block has been detected during a read or write request to the disk
drive.
Bad block relocation and mirroring
When a disk drive is formatted for the first time, a portion of the drive (about 5% in the case
of IBM drives) is set aside for bad block relocation. The format itself also masks and
readdresses existing bad blocks so that the medium is clean and ready for use. During use,
however, any disk drive can develop bad blocks that can be attributed to deterioration
caused by the setting and resetting of magnetic charges on the medium.
Bad blocks may be discovered during any read or write operation, triggering disk error 4s,
but they can only be actually relocated during a write operation.
At the software level, if your hardware does not support bad block relocation, you can set
logical volume bad block relocation. If a bad block is detected during a read or write
operation, its physical location is recorded in the logical volume device driver (LVDD)
defects directory. This directory is reviewed during each read or write request. Most
hardware does support bad block relocation and so the logical volume attribute is
irrelevant.
Bad blocks are never a problem when mirrored logical volumes are used. Either a read or
write request is completed on the mirror copy that is undamaged, and the damaged block is
always relocated. When a read requests a damaged block, the logical volume manager
converts the request to a write request and relocates the block with values derived from the
good copy. All this occurs without intervention or special configuration.
Transition statement — Let’s show the most important error entries the logical volume
manager creates.
Uempty
Class
Error Label and Recommendations
Type
LVM_BBEPOOL, S,P No more bad block relocation
LVM_BBERELMAX, Action: Replace disk as soon as
LVM_HWFAIL possible
LVM_SA_STALEPP S,P Stale physical partition
Action: Check disk, synchronize data
(syncvg)
LVM_SA_QUORCLOSE H,P Quorum lost, volume group closing
Action: Check disk, consider working
without quorum
Error Classes: H = Hardware Error Types: P = Permanent
S = Software T = Temporary
Notes:
LVM_SA_STALEPP
Stale physical partition
Action: Check disk, synchronize data (syncvg).
LVM_SA_QUORCLOSE
Quorum lost, volume group closing
Action: Check disk, consider working without quorum.
LOGFILE [/var/adm/ras/errlog]
*Maximum LOGSIZE [1048576] #
Memory Buffer Size [32768] #
...
# smit errclear
Clean the Error Log
Notes:
Instructor notes:
Purpose — Introduce the errdemon and errclear commands.
Details — Explain using the information in the student notes.
Additional information — The Change / Show Characteristics of the Error Log
screen also contains duplicate error options. If Duplicate Error Detection is set to true,
Duplicate Time Interval in milliseconds is used to set a threshold during which
identical error log entries are removed. The Duplicate error maximum sets the point at
which an additional identical error will be considered a new error. For more information, see
the AIX Commands Reference entry for errdemon.
Transition statement — Let’s switch over to an exercise. This exercise has three parts,
but you should only do the first part now. There will be time to do the other parts of the
exercise later.
Uempty
Notes:
Instructor notes:
Purpose — Introduce the next exercise.
Details — Be sure to mention that students should only do “Part 1" of the exercise at this
time. They will do the rest of the exercise later. Provide the goals of this part of the exercise
as given in the student notes.
Additional information — None
Transition statement — Let’s switch over to the next topic, “Error Notification and
syslogd.” We will start by discussing the different ways that error notification can be
implemented.
ODM-Based:
/etc/objrepos/errnotify
Error notification
Notes:
Uempty 3. ODM-based error notification: The errdemon program uses the ODM class errnotify
for error notification. How to work with errnotify is discussed later in this topic.
Instructor notes:
Purpose — Provide different ways to implement error notification.
Details — Explain using the information in the notes. The two methods shown are covered
in the visuals that follows, so there is no need to “pre-teach” them in detail now.
Additional information — Earlier versions of the course discussed concurrent error
logging (errpt -c). Periodic Diagnostics using diagela are not used on p5 and p6
platforms. The two methods shown are covered in the visuals that follow, so there is no
need to “pre-teach” them now. By default, periodic diagnostics sends mail notifications. It
can be customized to take other actions, such as interfacing to other applications. To
specify a customized action, one would create a PDiagAtt ODM class object with a value
descriptor set to the full path to a script. To see more details about this, refer to the
document AIX 5L Version 5.3 Understanding the Diagnostic Subsystem for AIX
(SC23-4919)
Periodic Diagnostics: The diagnostics package (diag command) contains a periodic
diagnostic procedure (diagela). Whenever a hardware error is posted to the log, all
members of the system group get a mail message. Additionally, a message is sent to the
system console. The diagela program has disadvantages:
• Since it executes many times a day, the program might slow down your system.
• Only hardware errors are analyzed.
• Since AIX 5.2, diagela has only supported analyzing processor errors and no other
hardware.
• In POWER5 and POWER6 hardware, diagela does not even support processor
diagnostics. Instead, the platform firmware (service processor) handles this and reports
hardware errors to the managing HMC.
Transition statement — Let’s provide an example to show how you might implement
self-made error notification.
Uempty
#!/usr/bin/ksh
while true
do
sleep 60 # Let's sleep one minute
done
Notes:
- The two files are compared using the command cmp -s (silent compare, that means
no output will be reported). If the files are not different, we jump back to the
beginning of the loop (continue), and the process will sleep again.
- If there is a difference, a new error entry has been posted to the error log. In this
case, we inform the operator that a new entry is in the error log. Instead of print
you could use the mail command to inform another person.
errnotify:
en_pid = 0
en_name = "sample"
en_persistenceflg = 1
en_label = ""
en_crcid = 0
en_class = "H"
en_type = "PERM"
en_alertflg = ""
en_resource = ""
en_rtype = ""
en_rclass = "disk"
en_method = "errpt -a -l $1 | mail -s DiskError root"
Notes:
List of descriptors
Here is a list of all descriptors for the errnotify object class:
en_alertflg Identifies whether the error is alertable. This descriptor is
provided for use by alert agents with network management
applications. The values are TRUE (alertable) or FALSE (not
alertable).
en_class Identifies the class of error log entries to match. Valid values are
H (hardware errors), S (software errors), O (operator messages),
and U (undetermined).
en_crcid Specifies the error identifier associated with a particular error.
en_label Specifies the label associated with a particular error identifier as
defined in the output of errpt -t (show templates).
en_method Specifies a user-programmable action, such as a shell script or a
command string, to be run when an error matching the selection
criteria of this Error Notification object is logged. The error
notification daemon uses the sh -c command to execute the
notify method.
The following keywords are passed to the method as arguments:
$1 Sequence number from the error log entry
$2 Error ID from the error log entry
$3 Class from the error log entry
$4 Type from the error log entry
$5 Alert flags from the error log entry
$6 Resource name from the error log entry
$7 Resource type from the error log entry
$8 Resource class from the error log entry
$9 Error label from the error log entry
en_name Uniquely identifies the object
en_persistenceflg Designates whether the Error Notification object should be
removed when the system is restarted. 0 means removed at boot
time; 1 means persists through boot.
syslogd daemon
IBM Power Systems
/etc/syslog.conf:
daemon.debug /tmp/syslog.debug
/tmp/syslog.debug:
syslogd inetd[16634]: A connection requires tn service
inetd[16634]: Child process 17212 has ended
# stopsrc -s inetd
Provide debug
# startsrc -s inetd -a "-d"
information
Notes:
Function of syslogd
The syslogd daemon logs system messages from different software components
(kernel, daemon processes, system applications).
Instructor notes:
Purpose — Describe how the syslogd daemon works.
Details — Explain using the information in the student notes.
Additional information — None
Transition statement — Let’s provide some other syslogd configuration examples.
Uempty
/etc/syslog.conf:
All security messages to the
auth.debug /dev/console system console
Notes:
- The following line specifies that all messages, except messages from the mail
subsystem, are to be sent to the syslogd daemon on the host server:
*.debug; mail.none @server
Note that, if this example and the preceding example appear in the same
/etc/syslog.conf file, messages sent to /tmp/daemon.debug will also be sent to
the host server.
Facilities
Use the following system facility names in the selector field:
kern Kernel
user User level
mail Mail subsystem
daemon System daemons
auth Security or authorization
syslog syslogd messages
lpr Line-printer subsystem
news News subsystem
uucp uucp subsystem
* All facilities
Priority levels
Use the following levels in the selector field. Messages of the specified level and all
levels above it are sent as directed.
Uempty emerg Specifies emergency messages. These messages are not distributed to all
users.
alert Specifies important messages such as serious hardware errors. These
messages are distributed to all users.
crit Specifies critical messages, not classified as errors, such as improper login
attempts. These messages are sent to the system console.
err Specifies messages that represent error conditions.
warning Specifies messages for abnormal, but recoverable conditions.
notice Specifies important informational messages.
info Specifies information messages that are useful in analyzing the system.
debug Specifies debugging messages. If you are interested in all messages of a
certain facility, use this level.
none Excludes the selected facility.
Instructor notes:
Purpose — Provide some syslogd configuration examples.
Details — Explain using the information in the student notes.
Additional information — Do not explain all facilities and levels. Just explain the
examples.
Transition statement — Let’s explain how to redirect syslogd messages to the error log.
Uempty
/etc/syslog.conf:
# errpt
Notes:
Instructor notes:
Purpose — Explain how to redirect syslog messages to the AIX error log.
Details — Explain using the information in the student notes.
Additional information — None
Transition statement — What about the other way round?
Uempty
errnotify:
en_name = "syslog1"
en_persistenceflg = l
en_method = "logger Error Log: `errpt -l $1 | grep -v TIMESTAMP`"
errnotify:
en_name = "syslog1"
en_persistenceflg = l
en_method = "logger Error Log: $(errpt -l $1 | grep -v TIMESTAMP)"
errnotify:
en_name = "syslog1"
en_persistenceflg = l
en_method = "errpt -l $1 | tail -1 | logger -t errpt -p
daemon.notice"
Notes:
Command substitution
You will need to use command substitution (or pipes) before calling the logger
command. The first two examples on the visual illustrate the two ways to do command
substitution in a Korn shell environment:
- Using the ‘UNIX command‘ syntax (with backquotes) - shown in the first example on
the visual
- Using the newer $(UNIX command) syntax - shown in the second example on the
visual
Instructor notes:
Purpose — Provide information on how to direct error log entries to the syslogd.
Details — Explain using the information in the student notes.
Point out that the visual just shows three ways to accomplish the same thing. The first two
examples use two different formats to invoke command substitution, which will place the
report text on the line before execution of the logger command. The last example feeds
the report text though a pipe to the logger command.
Additional information — None
Transition statement — Describe the basic features of system hang detection.
Uempty
System hangs:
High priority process
Other
What does shdaemon do?
Monitors the system's ability to run processes
Takes specified action if threshold is crossed
Actions:
Logs error in the error log
Displays a warning message on the console
Launches recovery login on a console
Launches a command
Automatically reboots the system
Notes:
Actions
If lower priority processes are not being scheduled, shdaemon will perform the specified
action. Each action can be individually enabled and has its own configurable priority
and time-out values. There are five actions available:
- Log error in the error log
- Display a warning message on a console
- Launch a recovery login on a console
- Launch a command
- Automatically REBOOT the system
Configuring shdaemon
IBM Power Systems
# shconf -E -l prio
sh_pp disable Enable Process Priority Problem
Notes:
Introduction
shdaemon configuration information is stored as attributes in the SWservAt ODM object
class. Configuration changes take effect immediately and survive across reboots.
Use shconf (or smit shd) to configure or display the current configuration of shdaemon.
The values shown in the visual are the default values.
Enabling shdaemon
At least two parameters must be modified to enable shdaemon:
- Enable priority monitoring (sh_pp)
- Enable one or more actions (pp_errlog, pp_warning, and so forth)
Action attributes
Each action has its own attributes, which set the priority and timeout thresholds and
define the action to be taken. The timeout attribute unit of measure is in minutes.
Example
By changing the shconf attributes, we can enable, disable, and modify the behavior of
the facility. For example:, shdaemon is enabled to monitor process priority
(sh_pp=enable), and the following actions are enabled:
- Enable the to monitor process priority monitoring:
# shconf -l prio -a sh_pp=enable
- Log error in the error logging:
# shconf -l prio -a pp_errlog=enable
Every two minutes (pp_eto=2), shdaemon will check to see if any process has been
run with a process priority number greater than 60 (pp_eprio=60). If not, shdaemon
logs an error to the error log.
- Display a warning message on a console:
# shconf -l prio -a pp_warning=enable (default value)
Every two minutes (pp_wto=2), shdaemon will check to see if any process has been
run with a process priority number greater than 60 (pp_wprio=60). If not, shdaemon
sends a warning message to the console specified by pp_wterm.
- Launch a command:
# shconf -l prio -a pp_cmd=enable -a pp_cto=5
Every five minutes (pp_cto=5), shdaemon will check to see if any process has been run
with a process priority number greater than 60 (pp_cprio=60). If not, shdaemon runs the
command specified by pp_cpath (in this case, /home/unhang).
Instructor notes:
Purpose — Describe how shdaemon is configured.
Details —
Additional information — shdaemon also supports lost I/O detection.
Transition statement — For an ILO (Instructor Lead On-line) class: In place of this
checkpoint visual you should play file AN152U03F21.
- a.To access the multimedia library click on the CD button along the toolbar in
Elluminate.
- b.Once the multimedia library window is open, select AN152U03F21 and click on
play.
- c.Ask the students to indicate via green checkmark when they have finished the file.
For a FTF (Face to Face) class: Play file AN152U03F21 and ask the students to call out
answers to the questions on the screen.
Uempty
Checkpoint (1 of 2)
IBM Power Systems
Notes:
Instructor notes:
Purpose — Present the checkpoint questions.
Details — A “Checkpoint Solution” is given below:
Checkpoint solutions (1 of 2)
IBM Power Systems
Additional information —
Transition statement —
Uempty
Checkpoint (2 of 2)
IBM Power Systems
Notes:
Instructor notes:
Purpose —
Details —
Checkpoint solutions (2 of 2)
IBM Power Systems
Uempty
Notes:
Instructor notes:
Purpose —
Details —
Additional information —
Transition statement —
Uempty
Unit summary
IBM Power Systems
Notes:
Instructor notes:
Purpose — Summarize the unit.
Details — Present the highlights from the unit.
Before continuing to the next unit stop and ask the students if there are any additional
questions before continuing.
Additional information — None
Transition statement — Let’s continue with the next unit.
Estimated time
01:00
References
SC23-6616 AIX Version 7.1 Installation and migration
SG24-7296 NIM from A to Z in AIX 5L (Redbook)
http://www.redbooks.ibm.com IBM Redbooks
© Copyright IBM Corp. 2009, 2012 Unit 4. Network Installation Manager basics 4-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Unit objectives
IBM Power Systems
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 4. Network Installation Manager basics 4-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
NIM overview
IBM Power Systems
Notes:
Purpose of NIM
NIM provides centralized AIX software administration for multiple machines over the
network. NIM supports full AIX operating system installation as well as installing or
updating individual packages and performing software maintenance.
Advantages
NIM provides several advantages:
- Provides one central point for AIX software administration for all the NIM clients
- Eliminates need to walk a CDROM or tape to each system and the need for a tape
drive or CDROM drive at every system
- Installations can be initiated from the master machine (push) or from the client (pull)
Uempty - The installation load can be distributed. Most simply, the NIM master machine is
configured as the server for all the filesets to be installed. However, you can also
configure one or more client machines to act as servers to distribute the load if you
have many clients.
© Copyright IBM Corp. 2009, 2012 Unit 4. Network Installation Manager basics 4-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Provide an overview of NIM function.
Details —
Additional information —
Transition statement — Let’s take a closer look at the three roles illustrated in the
overview.
Uempty
Machine roles
IBM Power Systems
Master
File sets:
bos.sysmgt.nim.master
bos.sysmgt.nim.client
Stores NIM database
NIM administration
Can initiate push installations to NIM clients
AIX version >= all other NIM machines
Client
File sets:
bos.sysmgt.nim.client
Can initiate pull installations from a server
Server
Any machine, master or client
Serves NIM resources to clients, thus requires adequate disk space and
throughput
© Copyright IBM Corporation 2012
Notes:
There are three basic roles that a machine can assume in the NIM environment: master,
client, and resource server. There can only be one master machine in a NIM
environment, all other machines are clients. Any machine, master or client, can be a
resource server.
NIM software
All machines in the NIM environment must install bos.sysmgt.nim.client. The master
machine must also install bos.sysmgt.nim.master and bos.sysmgt.nim.spot.
Master
The NIM master manages all other machines that participate in the NIM environment.
The NIM database is stored on the NIM master. The NIM master is fundamental for all
of the operations in the NIM environment and must be set up and operational before
© Copyright IBM Corp. 2009, 2012 Unit 4. Network Installation Manager basics 4-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
performing any NIM operations. The master can initiate a software installation to a
client, which is called a push installation.
Also, the NIM master is the only machine that is given the permissions and ability to
execute NIM operations on other machines within the NIM environment. The rsh or
nimsh commands are used to remotely execute commands on clients which allows the
NIM master to install to a number of clients with one NIM operation.
The master requires the filesets of bos.sysmgt.nim.master and
bos.sysmgt.nim.client. It is also required to have its AIX operating system software at
a level which is equal to or higher than any of the clients that it is serving.
Client
All other machines in a NIM environment are clients. Clients can request a software
installation from a server machine (pull installation). The client requires the fileset of
bos.sysmgt.nim.client.
Server
Any machine, the master or a client, can be configured by the master as a server for a
particular software resource. Most often, the master is also the server. However, if your
environment has many nodes or consists of a complex network environment, you may
want to configure some nodes to act as servers to improve installation performance.
Servers must have adequate disk space for the resources they will be providing. They
also need network connections to the client machines they serve and sufficient
bandwidth to respond to the expected volume.
© Copyright IBM Corp. 2009, 2012 Unit 4. Network Installation Manager basics 4-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
1 Boot image is on
Load boot image
removable media
Using programs
3 Configure devices on removable
media
Backup archive is
4 Install system files on removable
media
Notes:
To understand how NIM works, we need to understand what happens when we install
AIX on a system. We start by reviewing what happens when we boot from CD or tape to
install AIX. Note how all of the programs and information is obtained from the
removable media.
Configuring devices
In order to keep the boot image small, not all of the software needed to configure
devices is included in the boot image. These additional files are contained in a small
usr directory tree called a Shared Product Object Tree or SPOT. The boot script mounts
this usr directory tree on /SPOT in the memory file system. The SPOT is mounted
directly from the CDROM.
Note: Since tape devices do not support file system operations, the SPOT files are
included in the boot image in the case of booting from a tape drive.
Install script
Once the devices have been configured, rc.boot invokes the BOS installation program
(bi_main), and installs AIX from the installation images on the tape or CD.
© Copyright IBM Corp. 2009, 2012 Unit 4. Network Installation Manager basics 4-11
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Review the flow and components of an AIX installation from tape or optical
media.
Details —
Additional information —
Transition statement — If we next look at how a network install is handled, we will see
that there are many similarities with a regular installation, of course with some significant
variations.
Uempty
Figure 4-5. Boot process for AIX installation with NIM (1 of 2) AN152.2
Notes:
Booting over the network, using NIM, is essentially the same as booting from CD or
tape, except that the boot file (SPOT file) and installation images come from the server
system over the network.
© Copyright IBM Corp. 2009, 2012 Unit 4. Network Installation Manager basics 4-13
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Provide a description of the components and flow of a network installation
using NIM.
Details —
Additional information —
Transition statement — Let us continue with our comparison of using removable media
versus using a NIM server.
Uempty
Using programs
3 Configure devices
on NIM server
Backup archive is
4 Install system files
on NIM server
Figure 4-6. Boot process for AIX installation with NIM (2 of 2) AN152.2
Notes:
Invoke the boot script and configure devices needed for installation
When booting over the network, the SPOT is mounted from the NIM server using the
Network File System (NFS).
© Copyright IBM Corp. 2009, 2012 Unit 4. Network Installation Manager basics 4-15
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Continue description of installing AIX from a NIM server.
Details —
Additional information —
Transition statement — In order for NIM to manage this install process, it needs to have
objects that describe the machines and resources involved. Let’s take a high level look at
what these are.
Uempty
NIM objects
IBM Power Systems
Object classes
Re
s
ork
sou
Networks
tw
rce
Ne
Machines
s
Resources
Machines
Group objects
mac_group
res_group
Notes:
NIM is made up of various components, called objects. There are three classes of
objects: machines, networks, and resources.
All information about the NIM environment is stored in Object Data Manager (ODM)
databases on the NIM master system.
Network objects
Network objects are objects in the NIM database that represent information about each
Local Area Network (LAN) that is part of the NIM environment. These objects and some
of their attributes reflect the physical characteristics of the network. NIM network objects
are not used to perform management tasks in the overall network environment; they are
only used to represent the physical network topology of the NIM environment. In other
words, if something changes in the physical network environment, you must remember
to make the change in the NIM database as well.
© Copyright IBM Corp. 2009, 2012 Unit 4. Network Installation Manager basics 4-17
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
There are five types of networks supported by NIM: Token-Ring, Ethernet, ATM, FDDI,
and generic. These network types are represented as network objects in the NIM
environment.
Machine objects
Machines in the NIM environment are simply the machines that will be managed by
NIM.
Resource objects
All operations on clients in the NIM environment require one or more NIM resources.
NIM resource objects represent the files, directories, and devices that are used in order
to support each type of NIM operation. Some resources are AIX filesets (or devices
which contain filesets) that can be installed on a client machine. Other resources are
scripts or configuration files that are used in the installation process.
The location and other attributes for these resources are stored as resource objects in
the NIM database.
Group objects
NIM supports two types of group objects:
- mac_group
A machine group is a group of machine objects. You can use a machine group to
simplify performing a NIM operation on multiple machines.
- res_group
A resource group is a group of resource objects. If you have a set of resources that
you typically want to use at the same time, you can create a resource group to
simplify allocating those resources.
For an ILO (Instructor Lead On-line) class: You should play file AN152U04F17 in the
multimedia library of Elluminate in place of the next visual. You can then continue your
lecture normally to reinforce the topics if desired.
- a.To access the multimedia library click on the CD button along the toolbar in
Elluminate.
- b.Once the multimedia library window is open, select AN152U04F17 and click on
play.
- c.Ask the students to indicate via green checkmark when they have finished the file.
For a FTF (Face to Face) class: You should play file AN152U04F17 on your instructor PC
in place of the next visual.
Note that you can also use this activity as a review for the information covered in the
visual as well.
© Copyright IBM Corp. 2009, 2012 Unit 4. Network Installation Manager basics 4-19
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
# lsnim –l ent0
ent0:
class = networks
type = ent
Nstate = ready for use
prev_state = information is missing from this object's definition
net_addr = 10.31.192.0
snm = 255.255.240.0
routing1 = default 10.31.192.1
© Copyright IBM Corporation 2012
Notes:
The lsnim command is used to list various types of NIM information. You have the
opportunity to experiment with lsnim in the exercise.
© Copyright IBM Corp. 2009, 2012 Unit 4. Network Installation Manager basics 4-21
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
NIM configuration
IBM Power Systems
Configure master
Install master NIM file sets
Run nimconfig
Define resources
Create real resource with full path
Create resource object to represent
Define networks
How do clients on networks access the master
Define clients
Able to relate network address of the client with object name
Allocate resources to clients
Different operations need different resources
NIM operations on clients
Setting up for operation
Initiating operation
© Copyright IBM Corporation 2012
Notes:
Installing NIM
The NIM filesets that need to be installed on a machine designated to act as NIM
master are:
- bos.sysmgt.nim.client
- bos.sysmgt.nim.master
- bos.sysmgt.nim.spot
Configure master
Configuring the master machine consists of installing the master filesets and running
nimconfig. You must specify the primary network interface and a NIM network name
for the network which is attached to the primary interface. There are several optional
attributes which can be specified.
Uempty nimconfig creates the NIM database and the /etc/niminfo configuration file. It also
starts the NIM daemon (nimesis) and creates an entry in /etc/inittab so that nimesis is
started on every boot of the master machine.
Allocate resources
Once the resource and machine objects are defined, you need to decide what operation
you want to perform on your client machine. For each operation, there are different
resources needed.
Next, you need to allocate the resource to your client. This identifies which resource
object will be used to implement the client operation. There are two ways in which this is
done:
- Use the nim -o allocate operation (or equivalent SMIT dialog) to relate the
resource to the machine
- Use a SMIT dialog which prompts for the resources to allocate as part of the
machine operation definition
© Copyright IBM Corp. 2009, 2012 Unit 4. Network Installation Manager basics 4-23
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
The task can be initiated from the client; or, provided that the client machine has already
been configured as a NIM client, the NIM master can initiate the task.
© Copyright IBM Corp. 2009, 2012 Unit 4. Network Installation Manager basics 4-25
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Resources objects
IBM Power Systems
Object types
boot Represents the network boot image resource
nim_script Directory for customization scripts created by NIM
spot Shared Product Object Tree - equivalent to /usr file system
lpp_source Source device for software product images
bosinst_data Config file used during base system installation
image_data Config file used during base system installation
mksysb A mksysb image
script A user created script which is executed on a client to perform
customization
resolv_conf Configuration file for name-server information
... (additional resource types)
Attributes
location Directory path
server Machine which serves this resource
Rstate,
prev_state Status attributes
... (additional attributes)
© Copyright IBM Corporation 2012
Notes:
Resources are the files and directories that NIM uses to install software on the clients.
Resource types
Resource types identify the different types of files used by NIM. For example:
- An lpp_source resource is a directory containing product images to be installed
- A spot resource contains the files used during the boot operation
- A script resource is a user definable script which can be used to perform
customization on a newly installed client
- A mksysb resource is a backup image that can be used to install a client
© Copyright IBM Corp. 2009, 2012 Unit 4. Network Installation Manager basics 4-27
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Cover resources objects and their attributes.
Details — Note the variety of resources and that the attributes basically map between the
resource name and the location of the file or directory that contains that resource. Be
careful not to pre-teach the details on resources covered on later visuals, such as
lpp_source, spot, or mksysb. These are covered after the discussion of operations, so
they can be discussed in the context of those operations (in particular, the bos_inst
operation).
Additional information —
Transition statement — Let’s take a closer look at the resource types that we will need to
define to support a NIM installation of an AIX operating system, starting with the
lpp_source.
For an ILO (Instructor Lead On-line) class: You should play file AN152U04F17 in the
multimedia library of Elluminate in place of the next visual. You can then continue your
lecture normally to reinforce the topics if desired.
- a.To access the multimedia library click on the CD button along the toolbar in
Elluminate.
- b.Once the multimedia library window is open, select AN152U04F17 and click on
play.
- c.Ask the students to indicate via green checkmark when they have finished the file.
For a FTF (Face to Face) class: You should play file AN152U04F17 on your instructor PC
in place of the next visual.
Note that you can also use this activity as a review for the information covered in the
visual as well.
Uempty
lpp_source
Directory containing software product images
Supports NIM install operations (bos_inst and cust)
Also used for creation of spot resource py o
e nc
g
Defining an lpp_source:
# nim -o define -t lpp_source
-a server=<machine>
-a location=<directory>
lppsource
[ optional attributes ]
<lppsource_name>
aix61-00-00 aix61-01-00
# smit nim_mkres
bos filesets
Notes:
lpp_source
When a resource of this type is defined, it represents a directory in which software
product images are stored. lpp_source resources are used to support NIM install
operations. An lpp_source can also be used as the source for the creation of a SPOT.
When you perform a NIM install operation and have allocated an lpp_source resource
to the client, NIM NFS mounts the lpp_source directory on the client, and then invokes
the installp command on the client to install from the directory. When installp
finishes, NIM automatically unmounts the resource.
simages attribute
This attribute is used to indicate that an lpp_source resource contains the set of
installable images to which NIM requires access to perform its basic functionality. This
© Copyright IBM Corp. 2009, 2012 Unit 4. Network Installation Manager basics 4-29
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Uempty - Starting at AIX 5L Version 5.3, there is an update operation, which allows you to
update an lpp_source resource by adding and removing packages. Previously, you
could copy packages into an lpp_source directory or remove packages from an
lpp_source directory and run nim -o check to update the lpp_source attributes.
Previously, SMIT allowed you to add packages to an lpp_source through the
smit nim_bffcreate fast path. However, this SMIT function does not check to see
if the lpp_source is allocated or locked, nor does it update the simages attribute
when finished. The update operation has been created to address this situation.
© Copyright IBM Corp. 2009, 2012 Unit 4. Network Installation Manager basics 4-31
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Cover the definition of the lpp_source.
Details —
Additional information —
Transition statement — Once we have an lpp_source, we next need to use the
lpp_source to generate a matching SPOT. Let’s look at how that is done.
For an ILO (Instructor Lead On-line) class: You should play file AN152U04F17 in the
multimedia library of Elluminate in place of the next visual. You can then continue your
lecture normally to reinforce the topics if desired.
- a.To access the multimedia library click on the CD button along the toolbar in
Elluminate.
- b. Once the multimedia library window is open, select AN152U04F17 and click on
play.
- c. Ask the students to indicate via green checkmark when they have finished the file.
For a FTF (Face to Face) class: You should play file AN152U04F17 on your instructor PC
in place of the next visual.
Note that you can also use this activity as a review for the information covered in the
visual as well.
Uempty
spot
/usr directory tree used during network boot lppsource
Matching network boot images generated:
-/tftpboot/<spot_name>.<Platform>.<Kernel>.<Network>
Defining a SPOT
# nim -o define -t spot
-a server=<machine>
spot
-a location=<directory>
-a source=<lpp_source_name>
[ optional attributes ] spot61-00-00 spot61-01-00
<spot_name>
usr
•# smit nim_mkres
bin include lib etc
© Copyright IBM Corporation 2012
Notes:
Components
• A /usr file system
A Shared Product Object Tree (SPOT) is a directory containing AIX code that is
equivalent in content to the code that resides in a /usr file system on a system running
AIX. The NIM SPOT creation process restores files from AIX filesets into the directory in
which the SPOT resides.
The SPOT is NFS-mounted on a booting client to provide necessary device support for
the boot process.
• Boot image
As part of the creation of a SPOT resource, NIM also creates network boot images. The
network boot images are constructed in /tftpboot on the same machine in which the
SPOT is created. The boot images are constructed with code from the newly created
© Copyright IBM Corp. 2009, 2012 Unit 4. Network Installation Manager basics 4-33
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
SPOT. The boot images are also sometimes called spot files. The boot image file is
transferred to the client system using the BOOTP protocol.
Since one SPOT can potentially support several types of machines, several boot image
files may be created. The naming convention identifies each boot image as:
<spot_name>.<Platform>.<Kernel>.<Network>, where:
- <Platform> identifies which architecture this boot image supports: chrp, rspc, and
so forth
- <Kernel> specifies whether this boot image contains a multi-processor (mp) or
uni-processor (up) kernel.
- <Network> identifies the network type: ent, tok, and so forth
These days, the only combination most of us work with is: chrp.mp.ent.
During a network boot, the boot image is transferred over the network and loaded into
the client’s memory.
• /tftpboot
It is good practice to make /tftpboot be a separate file system. This removes the
risk of filling the root file system. If you are supporting multiple AIX versions on
multiple machine types or multiple network types, this directory can get quite large.
© Copyright IBM Corp. 2009, 2012 Unit 4. Network Installation Manager basics 4-35
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Cover how to define a SPOT.
Details —
Additional information —
Transition statement — While we can use an lpp_source and matching SPOT to install a
new operating system, quite often the network installs are actually recoveries of mksysb
images. This is either to recover a lost rootvg or to clone an AIX image to other machines
or LPARs. Let’s see how we define a mksysb resource.
Uempty
mksysb
Identifies a mksysb system backup image file
Used for bos_inst operations
Defining a mksysb
# nim -o define -t mksysb
-a server=<machine>
-a location=<mksysb_path>
[ optional attributes ]
<mksysb_name>
• # smit nim_mkres
© Copyright IBM Corporation 2012
Notes:
mksysb
A mksysb resource represents a system backup image file created using the mksysb
command. A mksysb resource can be used as the source of the BOS run-time files
when a bos_inst is performed.
© Copyright IBM Corp. 2009, 2012 Unit 4. Network Installation Manager basics 4-37
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
- location=<mksysb_path>
If the system backup image already exists, enter the name of the file where the
image resides. If you are creating the system backup image as part of this operation,
enter the name of the file where you want the image placed after it is created.
There are a number of optional attributes, including:
- mk_image={yes|no}
If the backup file already exists, specify no (the default). If you want nim to create a
new backup file, specify yes.
- source=<machine_name>
If you want nim to create a backup image for you, specify the NIM name of the
machine you want to back up.
- mksysb_flags=<value>
You can use this attribute to specify optional flags for the mksysb command, if
needed.
© Copyright IBM Corp. 2009, 2012 Unit 4. Network Installation Manager basics 4-39
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Networks objects
IBM Power Systems
Object types
ent Ethernet network
fddi FDDI network
tok Token ring network
atm ATM network (no network boot capability)
generic Generic network (no network boot capability)
Attributes
net_addr Network address for a network
snm Subnetmask for a network
routing<X> Routing information for a network
Nstate,
prev_state Status attributes
... (Additional attributes)
Notes:
In order to perform certain NIM operations, the NIM master must be able to supply
information necessary to configure client network interfaces. The NIM master must also
be able to verify that client machines can access all the resources provided by the NIM
server. To avoid the overhead of repeatedly specifying network information for each
individual client, NIM network objects are used to represent the networks in a NIM
environment.
Network types
NIM supports the four network types shown in the visual, plus a generic type. Network
boot support is provided for Ethernet, Token-Ring, and FDDI. Network boot operations
are not supported on ATM or generic networks. NIM supports both standard Ethernet
and IEEE 802.3 Ethernet networks.
Routing
NIM routing information represents standard TCP/IP routing information for the
networks that are part of a NIM environment. This information defines the gateways that
are used to establish communication between the master machine and the clients.
The routing<X> attribute defines a route and includes:
- A destination (default or a NIM network name)
- A gateway address
If needed, multiple routes can be created and are numbered routing1, routing2, and
so forth.
Additional attributes
There are a number of other attributes for each network object. lsnim is probably the
easiest way to get information about NIM attributes.
© Copyright IBM Corp. 2009, 2012 Unit 4. Network Installation Manager basics 4-41
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Cover networks objects and their attributes.
Details — Point out that we do not usually define a network object directly. Instead, the
information provided when defining a machine is used to either match to an existing
network object or to create a new network object. The most important point to make is that
the networking information is from the perspective of the machine being defined. In the
network diagram shown in the visual, when defining the client, it is the router interface
which is in the network to the right that needs to be defined as the gateway. The network
option is defining how the client would network boot in order to send a bootp request to the
NIM server.
Additional information — Unlike other network adapters, ATM adapters cannot be used
to boot a machine. This means that installing a machine over an ATM network requires
special processing (refer to the AIX Installation Guide and Reference, Chapter 20. Basic
NIM Operations and Configuration for instructions). The generic network type is used to
represent all other network types where network boot support is not available. For clients
on generic networks, NIM operations that require a network boot, such as bos_inst and
diag, are not supported. However, non-booting operations, such as cust and maint, are
allowed.
Transition statement — Next, let’s look at the machines object.
Uempty
Machines objects
IBM Power Systems
Object types
master
standalone
diskless Master
dataless
Attributes
platform Architecture Standalone
netboot_kernel up or mp
if<X> Network interface information
serves Resource served by this machine
Cstate, Diskless
prev_state,
Mstate Status attributes
... (additional attributes)
Dataless
Notes:
NIM supports four types of machines: the master type and three types of clients:
standalone, diskless, and dataless.
Master
The master machine is defined by installing the master fileset, and then performing
some quick configuration. There can only be one master in the NIM environment. Once
a machine is defined as the master, it can participate in NIM operations.
Standalone clients
Standalone clients have local disk resources. They are installed from the NIM server,
but once installed, they boot and operate from their local disks.
© Copyright IBM Corp. 2009, 2012 Unit 4. Network Installation Manager basics 4-43
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Diskless clients
Diskless clients have no disks of their own. They run entirely using resources from the
NIM server.
Dataless clients
Dataless machines can only use a local disk for paging space and the /tmp and /home
file systems. All of the other storage is provided over the network by the NIM server.
Machine attributes
Each machine object belongs to one of the four machines’ object classes. Additionally,
machine objects store other attributes about the machine. The visual shows a few of
them:
- The platform attribute describes the machine architecture (chrp, rspc, and so
forth).
- netboot_kernel indicates which type of kernel is required, uni-processor (up) or
multi-processor (mp).
- if<X> is used to provide information about a machine’s network interfaces. If there
are multiple interfaces, they are numbered: if1, if2, and so forth. This attribute
includes the NIM network this interface connects to, the host name, the MAC
address, and the network type.
- The serves attribute identifies resources that are served by this machine. If the
machine serves several resources, there will be a serves attribute for each
resource.
- Cstate indicates the NIM operation that is currently being performed on a machine
or that no NIM operations are currently being performed.
- prev_state shows the previous Cstate.
- Mstate shows the execution state for a machine.
Note: NIM attempts to keep the value of this attribute synchronized with the
machine's execution state, but NIM does not guarantee its accuracy. Perform the
check operation on the machine for NIM to attempt to determine the machine's
execution state.
Additional attributes
There are a number of other attributes for each machine object. lsnim is probably the
easiest way to get information about NIM attributes.
© Copyright IBM Corp. 2009, 2012 Unit 4. Network Installation Manager basics 4-45
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Examples:
# nim -o define -t standalone -a if1="network1 lpar1 0 ent0"
-a cable_type1="N/A" -a connect=nimsh
-a platform=chrp -a netboot_kernel=mp lpar1
# smit nim
Perform NIM Administrative Tasks
Manage Machines
Define a Machine
<provide hostname of client>
Notes:
Follow these steps to add a client with the network information using SMIT:
1. On the NIM master, add a standalone client to the NIM environment by using
SMIT (nim_mkmac is the fast path).
2. Specify the host name of the client.
This is the name translation of the IP address of the install adapter of this
machine. By default, this also becomes the hostname of this client when the
client is installed. If using DNS, enter in the long host name here. For example,
lpar1.my.company.com.
3. The next SMIT screen displayed depends on whether NIM already has
information about the client's network. Supply the values for the required fields or
accept the defaults. Use the help information and the LIST option to help you
specify the correct values to add the client machine.
Uempty For example using nim, the command line might look like:
# nim -o define -t standalone -a if1="net1 lpar1 0 ent0"
-a cable_type1="N/A" -a connect=nimsh
-a platform=chrp -a netboot_kernel=mp lpar1
The if1 quoted value in the example has multiple space delimited fields as follows:
• net1 is the network object name
• lpar1 is the hostname
• 0 is the place holder for the mac address
• ent0 is the physical adapter used by the client to reach the master
If using SMIT, the sequence of menu items to the matching dialog panel would be:
# smit nim
Perform NIM Administrative Tasks
Manage Machines
Define a Machine
<provide hostname of client>
The resulting dialogue panel is shown in the next visual.
© Copyright IBM Corp. 2009, 2012 Unit 4. Network Installation Manager basics 4-47
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Cover how to define standalone machines.
Details —
Additional information —
Transition statement — An easy way to define a machine is to use SMIT. The visual
shows the SMIT menu path to use, but let’s look at the resulting dialog panel.
For an ILO (Instructor Lead On-line) class: You should play file AN152U04F17 in the
multimedia library of Elluminate in place of the next visual. You can then continue your
lecture normally to reinforce the topics if desired.
- a .To access the multimedia library click on the CD button along the toolbar in
Elluminate.
- b. Once the multimedia library window is open, select AN152U04F17 and click on
play.
- c. Ask the students to indicate via green checkmark when they have finished the file.
For a FTF (Face to Face) class: You should play file AN152U04F17 on your instructor PC
in place of the next visual.
Note that you can also use this activity as a review for the information covered in the
visual as well.
Uempty
Define a Machine
Notes:
Machine type
Only one client machine type is used anymore - standalone.
© Copyright IBM Corp. 2009, 2012 Unit 4. Network Installation Manager basics 4-49
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Kernel type
If a client machine is running the 64-bit kernel, then mp should be chosen. However, if
the client is running the 32-bit kernel, either the up or mp kernel may be chosen. To
determine what client is currently, run the ls -l /usr/lib/boot/unix command.
Notice whether it is linked to the 64 up or mp kernel in that same directory. Also the
getconf -a command can be run to determine if the machine is capable of running an
mp kernel. An MP_CAPABLE setting of 1 means yes. On older releases, run the
bootinfo -z command to find out if the machine can handle mp. A setting of 1 again
means yes. Starting with version 6.1, AIX only uses a 64 bit kernel.
Communication protocol
Either the less secure shell protocol (rsh) may be used or the newer (nimsh) protocol
(which is available in AIX 5L 5.3 and later versions of AIX).
Note: Each client can have a different setting.
Cable type
Most configurations today are set to N/A (not applicable), as modern adapters are
autosensing of the connection type, or only support a single type (such as twisted pair
or fiber).This can be double checked by running the lsattr -El entX command to
notice whether the cable_type field shows. If not, then setting to N/A should work. If
running twisted pair cable, then setting it to tp should work.
Network speed/duplex
These settings are only used when performing a push boot operation on the client. If not
set, the current SMS speed/duplex settings for your install adapter are used.
NIM network
This is the NIM network to which the client is assigned.
CPU_ID
This is the machine ID retrieved from running the uname command on the client. It will
be used to uniquely identify this client in the future. You do not have to set this, NIM will
configure this.
Machine group
You can assign a client to a machine group.
Command line
The equivalent NIM command for the above operation is:
# nim -o define -t standalone -a if1="network1 lpar1 0 ent0"
-a cable_type1="N/A" -a connect=nimsh
-a platform=chrp -a netboot_kernel=mp lpar1
Use the lsnim -q define -t standalone command for more information or see
your nim man page.
© Copyright IBM Corp. 2009, 2012 Unit 4. Network Installation Manager basics 4-51
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Cover how to use the SMIT dialog panel to define a machine.
Details —
Additional information —
Transition statement — Now that we have all of our objects defined, we only need to
relate what resources are used with a machine, and then set up the NIM support to a
particular operation. Let’s look at the various NIM operations and how they relate to the
resource allocations.
Uempty
NIM operations
IBM Power Systems
Operations on clients
bos_inst
• rte
• mksysb
cust
maint
diag
maint_boot
Procedure
Allocate resources to clients (for intended operation)
Perform operation
Unallocate resources
Notes:
Operations on clients
NIM supports several different types of operations to install and manage software on
NIM clients. In addition, there are operations to manage the NIM objects themselves.
For the purposes of this class, we are primarily interested in three client operations:
- bos_inst
Allows you to install AIX on a client
- cust and maint
Allows you to update and maintain AIX software
- maint_boot
Allows you to boot a client to maintenance mode over the network
© Copyright IBM Corp. 2009, 2012 Unit 4. Network Installation Manager basics 4-53
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
bos_inst
A bos_inst operation is used to perform a Basic Operating System (BOS) installation
on a client. There are two types of bos_inst operations: rte and mksysb.
bos_inst customization
The NIM installation process provides the ability to invoke a customization script after
AIX is installed on the system. This is done by allocating a script resource to the client
before performing the bos_inst. That script could be used to perform such
customization as setting passwords, changing network addresses, and so forth.
cust
This NIM operation performs software customization on a running NIM client. You can
use the cust operation to:
- Update existing software
- Install additional software
- Run a customization script
maint
This NIM operation performs software maintenance operations on clients, such as
committing applied software, removing software, and so forth.
Uempty diag
This NIM operation enables the client to boot to diagnostics over the network.
maint_boot
This operation enables the client to boot to maintenance mode over the network.
© Copyright IBM Corp. 2009, 2012 Unit 4. Network Installation Manager basics 4-55
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
© Copyright IBM Corp. 2009, 2012 Unit 4. Network Installation Manager basics 4-57
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
bos_inst operation
IBM Power Systems
Command line
# nim -o bos_inst
-a lpp_source=<lpp_res_name>
-a spot=<spot_name>
-a source={rte|mksysb}
-a mksysb=<mksysb_name>
-a boot_client={yes|no}
[optional attributes]
<client_name>
• # smit nim_bosinst
Notes:
bos_inst
Configuring NIM to perform a bos_inst can be done from the command line or through
SMIT. There are two steps: allocating resources to the client and enabling the
bos_inst. It is also possible to combine these steps into one command:
# nim -o bos_inst -a lpp_source=<lpp_res_name> -a spot=<spot_name>
[additional resources] [-a source={rte|mksysb} [additional attributes]
<client_name>
If you use SMIT to enable a bos_inst, SMIT opens a series of windows to prompt you
for the required information and then displays a window where you can set additional
optional attributes.
Optional information
Optional attributes include:
- source={rte|mksysb}
mksysb=<mksysb_name>
If you do not specify the source attribute, nim performs a rte bos_inst. If you set
source=mksysb, then you must use the mksysb attribute to specify the name of the
mksysb resource you wish to use.
Note: In most cases, you must still include an lpp_source resource, even if you are
doing a mksysb install. With AIX 5L and later, if you have created a mksysb that
includes all devices, you do not need to specify an lpp_source.
- boot_client={yes|no}
When set to yes, the master attempts to reboot the client machine automatically for
reinstallation. For this option to succeed, the client must be running and initialized as
a NIM client or have rhosts permissions granted to the master. If set to no, the
server is configured to support the network boot. The actual boot would need to be
initiated later.
© Copyright IBM Corp. 2009, 2012 Unit 4. Network Installation Manager basics 4-59
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Cover how to set up for installing an operating system on a machine.
Details —
Additional information — Note: In the CSM environment, boot_client is normally set to
no and the client is rebooted using the netboot script from the management server.
Transition statement — Having run a bos_inst operation on a machine object, NIM is
now prepared to respond to a network boot request from that machine. Network booting an
LPAR is something that was covered in previous courses, so we will not repeat that
discussion here (though the later lab exercises will have you practice this).
This unit was just a high level introduction to NIM. To properly use NIM, there is much more
you will need to understand. Let’s look at how you can build your skills beyond what has
been taught here.
Uempty
Documentation
NIM from A to Z in AIX 5L
(http://www.redbooks.ibm.com/ )
AIX Version 7.1 Installation Guide and Reference
EZ NIM
nim_master_setup
nim_client_setup
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 4. Network Installation Manager basics 4-61
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Classes
You should also consider the following class.
- AN220 - AIX Network Installation Management (NIM)
(IBM Learning Services training course:
http://www.ibm.com/services/learning/index.html)
© Copyright IBM Corp. 2009, 2012 Unit 4. Network Installation Manager basics 4-63
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 4. Network Installation Manager basics 4-65
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Checkpoint
IBM Power Systems
Notes:
Checkpoint solutions
IBM Power Systems
Additional information —
Transition statement — Having explained how to do a basic installation and configuration
of NIM, let’s actually implement this in the lab exercise.
© Copyright IBM Corp. 2009, 2012 Unit 4. Network Installation Manager basics 4-67
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 4. Network Installation Manager basics 4-69
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Unit summary
IBM Power Systems
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 4. Network Installation Manager basics 4-71
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Estimated time
01:30
References
Online AIX Version 7.1 Operating system and device
management
Note: References listed as “online” above are available through the
IBM Systems Information Center at the following address:
http://publib.boulder.ibm.com/eserver
© Copyright IBM Corp. 2009, 2012 Unit 5. System initialization: Accessing a boot image 5-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Unit objectives
IBM Power Systems
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 5. System initialization: Accessing a boot image 5-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
© Copyright IBM Corp. 2009, 2012 Unit 5. System initialization: Accessing a boot image 5-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Possible failures
Notes:
Uempty To use an alternate boot location you must invoke the appropriate bootlist by pressing
function keys during the boot process. There is more information on bootlists, later in
the unit.
Last steps
Passing control to the operating system means that the AIX kernel (which has just been
loaded from the boot image) takes over from the system firmware that was used to find
and load the boot image. The operating system is then responsible for completing the
boot sequence. The components of the boot image are discussed later in this unit.
All devices are configured during the boot process. This is performed in different
phases of the boot by the cfgmgr utility.
Towards the end of the boot sequence, the init process is started and processes the
/etc/inittab file.
© Copyright IBM Corp. 2009, 2012 Unit 5. System initialization: Accessing a boot image 5-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Introduce the AIX boot process. Keep this at the overview level.
Details —
Additional information — You might mention at this point that logical key switches are
used to determine which bootlist is used. If you press F5 or numeric 5, the system tries to
boot from a default bootlist that contains the diskette, CD-ROM, hard drive, and network. If
it boots from the hard drive, it will load AIX diagnostics rather than perform a normal boot.
Transition statement — Let’s show how the boot image is loaded from the boot logical
volume when booting from disk.
Uempty
Firmware
Bo o
Boot ts
devices
(1) Diskette codetrap
(2) CD-Rom RAM
(3) Internal disk Boot Logical Volume
(4) Network (hd5)
hdisk0
Boot
controller
Notes:
Introduction
This visual shows how the boot logical volume is found during the AIX boot process.
Machines use one or more bootlists to identify a boot device. The bootlist is part of the
firmware.
Bootstrap code
System p and pSeries systems can manage several different operating systems. The
hardware is not bound to the software. The first block of the boot disk contains
bootstrap code that is loaded into RAM during the boot process. This part is sometimes
referred to as System Read Only Storage (ROS). The bootstrap code gets control. The
task of this code is to locate the boot logical volume on the disk, and load the boot
image. In some technical manuals, this second part is called the Software ROS. In the
case of AIX, the boot image is loaded.
© Copyright IBM Corp. 2009, 2012 Unit 5. System initialization: Accessing a boot image 5-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
© Copyright IBM Corp. 2009, 2012 Unit 5. System initialization: Accessing a boot image 5-11
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
AIX kernel
The AIX kernel is the core of the operating system and provides basic services like
process, memory, and device management. The AIX kernel is always loaded from the
boot logical volume. There is a copy of the AIX kernel in the hd4 file system (under the
name /unix), but this program has no role in system initialization. Never remove /unix,
because it is used for rebuilding the kernel in the boot logical volume.
Uempty RAMFS
This RAMFS is a reduced or miniature root file system which is loaded into memory
and used as if it were a disk-based file system. The contents of the RAMFS are slightly
different depending on the type of system boot:
Reduced ODM
The boot logical volume contains a reduced copy of the ODM. During the boot process,
many devices are configured before hd4 is available. For these devices, the
corresponding ODM files must be stored in the boot logical volume.
© Copyright IBM Corp. 2009, 2012 Unit 5. System initialization: Accessing a boot image 5-13
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Describe the components of the BLV.
Details — Introduce the different components as described in the student material.
Describe that the AIX kernel from the BLV is used during the boot process.
Additional information — Describe what the term reduced ODM means. Explain that
device support is available only for devices that are marked as base devices in PdDv.
The protofiles (in /usr/lib/boot and /usr/lib/boot/protoext) are used by the bosboot
command to determine which files should be put into the RAMFS image that is included in
the boot image.
Transition statement — Many system boot problems involve being unable to locate a
good boot image. In order to fix these problems, we often need to boot into special modes.
Let’s look at what determines which boot device is used.
© Copyright IBM Corp. 2009, 2012 Unit 5. System initialization: Accessing a boot image 5-15
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Normal bootlist:
# bootlist -m normal hdisk0 hdisk1
# bootlist -m normal -o
hdisk0 blv=hd5
hdisk1 blv=hd5
Notes:
Introduction
You can use the command bootlist or diag from the command line to change or
display the bootlists. You can also use the System Management Services (SMS)
programs. SMS is covered on the next visual.
bootlist command
The bootlist command is the easiest way to change the bootlist. The first example
shows how to change the bootlist for a normal boot. In this example, we boot either from
hdisk0 or hdisk1. To query the bootlist, you can use the bootlist -o option.
The blv=hd5 part of the bootlist entry is to identify which boot logical volume to use on
that listed disk. This is related to the AIX multibos capability that is covered later in this
course.
The second example shows how to display the customizable service bootlist.
Uempty The bootlist command also allows you to specify IP parameters to use when
specifying a network adapter. For example:
# bootlist -m service ent0 gateway=192.168.1.1 bserver=192.168.10.3
client=192.168.1.57
Using the service bootlist in this way can allow you to boot to maintenance or diagnostic
using a NIM server without having to use SMS to specify the network adapter as the
boot device.
Types of bootlists
The normal bootlist is used during a normal boot.
The default bootlist (hard coded in the firmware) is called when numeric 5 is pressed
during the boot sequence.
Most machines, in addition to the default bootlist and the customized normal bootlist,
allow for a customized service bootlist. This is set using mode service with the
bootlist command. The service bootlist is called when the numeric 6 key is pressed
during boot.
For machines which are partitioned into logical partitions, the HMC is used to boot the
partitions and it provides for specifying boot modes, thus eliminating the need to time
the pressing of special keys. Since pressing either numeric 5 or numeric 6 keys causes
a service mode boot and since a service mode boot using a boot logical volume will
result in booting to diagnostics, these options are referred to in the HMC as booting to
diagnostic either with the default bootlist or the stored (customizable) bootlist.
Here is a list summarizing the boot modes and the manual keys associated with them
(this can vary depending on the model of your machine):
• Numeric 1: Start an SMS (System Management Services) mode boot
• Numeric 5: Start a service mode boot using the default service bootlist
The default service bootlist is:
cd0
hdisk0 blv=hd5
ent0
• Numeric 6: Start a service mode boot using the customized service bootlist.
You can find variations on the different models of AIX systems. Refer to your specific
model at: http://publib.boulder.ibm.com/eserver. Look for your model under IBM
Systems Hardware.
© Copyright IBM Corp. 2009, 2012 Unit 5. System initialization: Accessing a boot image 5-17
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Describe how to work with the bootlists.
Details —
Additional information — The bootlist command will accept one more mode called
both. As you might suspect, the both mode sets the service and normal bootlist as the
same time to the same value.
Transition statement — Let’s continue with discussion of the bootlist command but
looking at an AIX 7 enhancement that helps us manage multi-path I/O situations.
Uempty
The pathid argument may be repeated for multiple paths in the desired
order:
# bootlist -m normal hdisk0 blv=hd5 pathid=0 pathid=1
or
# bootlist -m normal hdisk0 blv=hd5 pathid=0,1
The bootlist command will now also display the pathid with the
device:
# bootlist -m normal –o
hdisk0 blv=hd5 pathid=0
hdisk0 blv=hd5 pathid=1
Notes:
The benefit to the user with regard to the pathid command modifications is the ability
to operate at a pathid level. This is very important with the bootlist command where
users used to have to selectively delete and reconfigure device paths to generate
bootlists on systems with MPIO disks. The operation can now be performed with a
single command.
There have also been situations where the bootlist was too long. When the bootlist
specifies disks without any pathid restriction, it adds all paths with each path taking an
entry in the bootlist. The bootlist has a limited capacity. This could result being unable to
use an alternate disk. Use of the pathid specification can avoid this type of problem.
It is important to remember that ordering of paths will be maintained with the bootlist
command. If a user wishes the bootlist to be set to boot from paths 1, 0, and 2
respectively, using the pathid=1,0,2 argument will perform this operation for them.
© Copyright IBM Corp. 2009, 2012 Unit 5. System initialization: Accessing a boot image 5-19
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Discuss AIX 7 bootlist pathid enhancements.
Details — Once upon a time, the only information about a disk in the bootlist was the name
of the disk.
Then with the advent of multibos, we could have two boot logical volumes in the same
rootvg, each at a different fix pack level. As a result, the bootlist needed to identify both the
disk name and the BLV that should be used.
We have also discovered that using disks which are defined in SAN attached storage
subsystems provides multiple paths to that boot disk. To address this situation, AIX can
now include the pathid as part of the information in the bootlist.
The examples which are shown assume that there is only one BLV on the specified boot
disk, which is indicated with blv=hd5.
The first example shows setting the bootlist to a single disk and restricting it to only one
path. This can be useful when you need to restrict the number of entries in the bootlist. By
default, if the system administrator only identifies the logical name of the disk in specifying
the bootlist, the bootlist automatically includes one entry for each pathid. There is a
maximum size to a bootlist and having all possible paths can fill up the bootlist fairly quickly.
The second example shows how the bootlist command can be used to specify multiple
paths to a given boot disk. Notice that there are two different syntax's, both valid. The first
syntax has the pathid=# specified multiple times. The second syntax shows that you can
specify the pathid attribute a single time with the assignment of a comma-delimited list of
pathids.
As explained in the student notes, the order in which the pathids are specified will
determine the order in which these paths are tried in accessing the boot image.
The last bullet illustrates how the bootlist display (-o for output) will list each unique
combination that is defined in the bootlist.
Additional information —
Transition statement — The SMS programs provide another method to set a bootlist.
Let’s take a look at how to start SMS.
Uempty
Notes:
Booting to SMS
If you can not boot AIX because the bootlist needs correcting, then you will need to use
the System Management Services (SMS) to modify the bootlist. The SMS programs are
integrated into the hardware (they reside in NVRAM).
The visual shows how to start the System Management Services. There is an
equivalent graphic menu seen on older systems. During system boot, shortly before the
firmware looks for a boot image, it discovers some basic hardware on the system. At
this point the LED usually will display a value of E1F1. As the devices are discovered,
either a text name or graphic icon for the resource will display on the screen. The
second device discovered is usually the keyboard. When the keyboard is discovered, a
unique double beep tone is usually sounded. Having discovered the keyboard, the
system is ready to accept input that will override the default behavior of conducting a
normal boot. But once the last icon or name is displayed, the system starts to use the
bootlist to find the boot image and it is too late to change it. One of the keyboard actions
© Copyright IBM Corp. 2009, 2012 Unit 5. System initialization: Accessing a boot image 5-21
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
you can do during this brief period of time is to press the numeric 1 key to request that
the system boot using SMS firmware code.
For an ILO (Instructor Lead On-line) class: You should play file AN152U05F08 in the
multimedia library of Elluminate in place of the next two visuals. You can then continue your
lecture normally to reinforce the topics if desired.
- a.To access the multimedia library click on the CD button along the toolbar in
Elluminate.
- b.Once the multimedia library window is open, select AN152U05F08 and click on
play.
- c.Ask the students to indicate via green checkmark when they have finished the file.
For a FTF (Face to Face) class: You should play file AN152U05F08 on your instructor PC
in place of the next two visuals.
Note that you can also use this activity as a review for the information covered in the
next two visuals as well.
© Copyright IBM Corp. 2009, 2012 Unit 5. System initialization: Accessing a boot image 5-23
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Uempty It allows you to either specify a specific device to boot with right now, modify the
customized bootlists (with the intent of booting using one of them), or to request that
you be prompted at each boot for the device to boot from (multiboot option).
The focus here is the second option, used to modify the customized bootlist. The
Configure Bootlist Device Order panel lists:
1. Select 1st Boot Device
2. Select 2nd Boot Device
3. Select 3rd Boot Device
4. Select 4th Boot Device
5. Select 5th Boot Device
6. Display Current Setting
7. Restore Default Setting
It allows us to either list or modify the bootlist. You select which position in the bootlist
you wish to modify and then it lists possible device type to obtain a list of device to
select:
1. Diskette
2. Tape
3. CD/DVD
4. IDE
5. Hard Drive
6. Network
7. None
8. List All Devices
Select the device type. If you do not have many bootable devices it is sometimes easier
to use the List All Devices option.
Finally, you would select a specific device to place in that position of the bootlist, as
illustrated on the next visual.
It is important to understand that when SMS is used to modify the bootlist, both the
normal bootlist and the service bootlist are modified. If you wanted them to be different,
you will need to recustomize them, later, when you have a command prompt (such as in
multiuser mode).
© Copyright IBM Corp. 2009, 2012 Unit 5. System initialization: Accessing a boot image 5-25
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Show how to change the bootlist in SMS
Details — When you use SMS to change the bootlist, you are changing both the normal
and service customizable bootlists. After fixing the problem at hand, you may with to use
the bootlist command to recustomize them if you want them to be different.
Additional information — The following keys are used (follow with the HMC identifying
text):
- F1 or numeric 1: Start System Management Services
- F5 or numeric 5: Boot in diagnostic mode, use default bootlist
- F6 or numeric 6: Boot in diagnostic mode, use nondefault bootlist
The default bootlist is set to diskette, CD-ROM, internal disk and any communication
adapter.
To boot diagnostics from disk, do not insert a CD and request to use the default bootlist
(press the appropriate key (F5/numeric 5) or specify with HMC).
The other options:
Boot versus Multiboot
Under Select Boot Options, there is a multiboot mode item. This is a toggle that turns
multiboot mode either on or off. If you turn it on, the system will boot to an SMS menu every
time you boot the system in normal mode. This is to allow you to choose where to boot from
each time. For example, you might have different versions of AIX on different hard disks
and want to alternate boot between them. If an SMS menu is displayed when performing a
normal boot, this might be the reason.
Transition statement — Once we have selected the category of boot device, we need to
select the particular device we wish to use in the identified position in the bootlist. Let’s see
how we do this.
Uempty
Select Device
Device Current Device
Number Position Name
1. - IBM 10/100/1000 Base-TX PCI-X Adapter
( loc=U789D.001.DQDWAYT-P1-C5-T1 )
2. - SAS 73407 MB Harddisk, part=2 (AIX 7.1.0)
( loc=U789D.001.DQDWAYT-P3-D1 )
3. 1 SATA CD-ROM
( loc=U789D.001.DQDWAYT-P1-T3-L8-L0
Select Task )
4. None SAS 73407 MB Harddisk, part=2 (AIX 7.1.0)
===> 2 ( loc=U789D.001.DQDWAYT-P3-D1 )
1. Information
2. Set Boot Sequence: Configure as 1st Boot Device
Current Boot Sequence
===> 2 1. SAS 73407 MB Harddisk, part=2 (AIX 7.1.0)
( loc=U789D.001.DQDWAYT-P3-D1 )
2. None
3. None
4. None
© Copyright IBM Corporation 2012
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 5. System initialization: Accessing a boot image 5-27
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Next you are presented with a Select Task panel which provide the following options:
1. Information
2. Set Boot Sequence: Configure as 1st Boot Device
Once you have selected a device, you need to set that selection.
You can repeat this for each position in the bootlist. The other option is to clear a device
by specifying none as an option for that position.
Exiting out of SMS will always trigger a boot attempt. If you have not specified a
particular device for this boot, it will use the bootlist you have set in SMS.
© Copyright IBM Corp. 2009, 2012 Unit 5. System initialization: Accessing a boot image 5-29
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
© Copyright IBM Corp. 2009, 2012 Unit 5. System initialization: Accessing a boot image 5-31
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Boot alternatives
The device the system will boot off of is the first one it finds in the designated bootlist.
Whenever the effective boot device is bootable media, such as a mksysb tape/CD/DVD
or installation media, the system will boot to the Install and Maintenance menu.
If the booting device is a network adapter, the mode of boot depends on the
configuration of the NIM server which services the network boot request. If the NIM
server is configured to support an AIX installation or a mksysb recover, then the system
will boot to Install and Maintenance. If the NIM server is configured to serve out a
maintenance image, then the system boots to a Maintenance menu (a sub-menu of
Install and Maintenance). If the NIM server is configured to serve out a diagnostic
image, then we boot to a diagnostic mode.
There are other ways to boot to a diagnostic utility. If the booting device is a CD with a
diagnostic CD in the drive, we boot into that diagnostic utility. If a service mode boot is
Uempty requested and the booting device is a hard drive with a boot logical volume, then the
system boots into the diagnostic utilities.
The system can be signaled which bootlist to use during the boot process. The default
is to use the normal bootlist and boot in a normal mode. This can be changed during a
window of opportunity between when the system discovers the keyboard and before it
commits to the default boot mode. The signal may be generated from the system
console (this may be an HMC provided virtual terminal) or from a service processor
attached workstation (such as an HMC) which can simulate a keyboard signal at the
right moment.
The keyboard signal that is used can vary from firmware to firmware, but the most
common is a numeric 5 to indicate that the firmware should use the service bootlist and
a numeric 6 to indicate that the firmware should use the customizable service bootlist.
Either of these special keyboard signals will result in a service mode boot, which as we
stated can cause a boot to diagnostic mode when booting off a boot logical volume on
your hard drive.
With an HMC, you can specify which signal to send as part of the LPAR activation. Even
if you forget to override the default boot mode (usually normal to multiuser), you can still
use the virtual console keyboard as described to override, once the keyboard has been
discovered.
© Copyright IBM Corp. 2009, 2012 Unit 5. System initialization: Accessing a boot image 5-33
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Explain how the boot mode is controlled.
Details —
Additional information —
Transition statement — Let’s continue to look at the factors that affect boot behavior.
Uempty
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 5. System initialization: Accessing a boot image 5-35
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Continue covering the factors that affect boot behavior.
Details —
Additional information —
Transition statement — Let’s use what we have just learned to effect a boot to
maintenance mode.
Uempty
HMC
Boot the system from
Advance Activate options: the BOS CD-ROM, tape
Default bootlist or
network device (NIM)
Notes:
Introduction
The visual shows an overview of how we access a system that will not boot normally.
The maintenance mode can be started from an AIX CD, an AIX bootable tape (like a
mksysb), or a network device that has been prepared to access a NIM master. The
devices that contain the boot media must be stored in the bootlists.
© Copyright IBM Corp. 2009, 2012 Unit 5. System initialization: Accessing a boot image 5-37
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
profile to allocate that device. If the device is currently allocated to another LPAR,
then you will need to first deallocate it from that other LPAR.Use a dynamic LPAR
operation on the HMC to allocate that slot.
- If using the default bootlist, the sequence is fixed and the CD drive is the first
practical device.
- If using a tape drive or a network adapter as your boot device and not selecting a
boot device through SMS for this particular boot, then you will need to use one of the
customizable bootlists, usually the service bootlist.
Verify your bootlist, but do not forget that some machines do not have a service
bootlist. Check that your boot device is part of the bootlist:
# bootlist -m service -o
- If you want to boot from your internal tape device, you need to change the bootlist
because the tape device by default is not part of the bootlist. For example:
# bootlist -m service rmt0 hdisk0
- Whichever bootlist you are using, insert the boot media (either tape or CD) into the
drive.
- Power on the system (or activate the LPAR). The system begins booting from the
installation media. After several minutes, c31 is displayed in the LED/LCD panel (or
as the reference code on the HMC display) which means that the software is
prompting on the console for input (normally to select the console device and then
select the language). For an LPAR, your will need to have the virtual console started
to interact with the prompts.
- Normally, you are prompted to select the console device and then select the
language. After making these selections, you see the Installation and
Maintenance menu.
For partitioned systems with an HMC, you would normally use the HMC to access SMS
and then select the bootable device, which would bypass the use of a bootlist.
You can also use a NIM server to boot to maintenance. For this, you would need to
place your system’s network adapter in your customized service bootlist before any
other bootable devices, or use SMS to specifically request boot over that adapter (the
latter option is most common). Here is an example of setting the service boot list:
# bootlist -m service ent0 gateway=192.168.1.1
bserver=192.168.10.3 client=192.168.1.57
You would also need to set up the NIM server to provide a boot image for doing a
maintenance boot. For example, at the NIM server:
# nim -o maint_boot -spot <spotname> <client machine object
name>
© Copyright IBM Corp. 2009, 2012 Unit 5. System initialization: Accessing a boot image 5-39
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Identify how to access a system that does not boot.
Details — Emphasize that what causes us to boot into the Installation and Maintenance
menu is the fact that we booted off of installation media. It does not matter if we boot in
normal mode (using the normal bootlist) or service mode (using the default or customizable
service bootlists). It is only important that we find bootable installation media (tape, CD, or
network server) in the bootlist before anything else (such as a BLV or a Diagnostic CD).
With some SMS facilities, we can specify a particular device to use and bypass the
bootlists entirely.
Additional information —
Transition statement — Let’s show the maintenance mode menus that are available.
For an ILO (Instructor Lead On-line) class: You should play file AN152U05F13 in the
multimedia library of Elluminate in place of the next visual. You can then continue your
lecture normally to reinforce the topics if desired.
- a. To access the multimedia library click on the CD button along the toolbar in
Elluminate.
- b. Once the multimedia library window is open, select AN152U05F13 and click on
play.
- c. Ask the students to indicate via green checkmark when they have finished the file.
For a FTF (Face to Face) class: You should play file AN152U05F13 on your instructor PC
in place of the next visual.
Note that you can also use this activity as a review for the information covered in the
next visual as well.
Uempty
Maintenance
Type the number of your choice and press Enter.
Choice [1]: 1
Notes:
First steps
When booting in maintenance mode, you first have to identify the system console that
will be used, for example your virtual console (vty), graphic console (lft), or serial
attached console (tty that is attached to the S1 port).
After selecting the console, the Installation and Maintenance menu is shown:
1 Start Install Now with Default Settings
2 Change/Show Installation Settings and Install
3 Start Maintenance Mode for System Recovery
4 Configure Network Disks (iSCSI)
© Copyright IBM Corp. 2009, 2012 Unit 5. System initialization: Accessing a boot image 5-41
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
In a network boot using NIM, the console goes straight to the maintenance menu.
From this point, we access our rootvg to execute any system recovery steps that may
be necessary.
© Copyright IBM Corp. 2009, 2012 Unit 5. System initialization: Accessing a boot image 5-43
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Choice: 1
Choice [99]: 1
Notes:
Access this volume group and start a shell before mounting file systems
When you choose this selection, the rootvg will be activated, but the file system
belonging to the rootvg will not be mounted.
A typical scenario where this selection is chosen is when a corrupted file system needs
to be repaired by the fsck command. Repairing a corrupted file system is only possible
if the file system is not mounted.
Another scenario might be a corrupted hd8 transaction log. Any changes that take place
in the superblock or i-nodes are stored in the log logical volume. When these changes
are written to disk, the corresponding transaction logs are removed from the log logical
volume.
A corrupted transaction log must be reinitialized by the logform command, which is
only possible, when no file system is mounted. After initializing the log device, you need
to do a file system repair for all file systems that use this transaction log. Beginning with
AIX 5L V5.1, you have to explicitly specify the file system type: JFS or JFS2:
# logform -V jfs2 /dev/hd8
# fsck -y -V jfs2 /dev/hd1
# fsck -y -V jfs2 /dev/hd2
# fsck -y -V jfs2 /dev/hd3
# fsck -y -V jfs2 /dev/hd4
# fsck -y -V jfs2 /dev/hd9var
# fsck -y -V jfs2 /dev/hd10opt
# exit
Keep in mind that US keyboard layout is used but you can use the retrieve function by
using set -o emacs or set -o vi.
© Copyright IBM Corp. 2009, 2012 Unit 5. System initialization: Accessing a boot image 5-45
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Describe how to access the rootvg.
Details —
Additional information — Describe that the logform command can result in data loss.
Transition statement — Let’s check where to find information about boot errors.
Uempty
Maintenance
3 Rebuild #
#
bosboot
sync
-ad /dev/hdisk0
BLV
# sync
# reboot
Notes:
Maintenance mode
If the boot logical volume is corrupted (for example, bad blocks on a disk might cause a
corrupted BLV), the machine will not boot.
To fix this situation, you must boot your machine in maintenance mode, from a CD or
tape. If NIM has been set up for a machine, you can also boot the machine from a NIM
master in maintenance mode. NIM is actual a common way to do special boots in a
logical partition environment.
© Copyright IBM Corp. 2009, 2012 Unit 5. System initialization: Accessing a boot image 5-47
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
accessing the rootvg, you can repair the boot logical volume with the bosboot
command. You need to specify the corresponding disk device, for example hdisk0:
# bosboot -ad /dev/hdisk0
# sync
# sync
# reboot
The sync commands will flush any file data in memory cache to disk. While you would
normal use a shutdown command, in maintenance mode it is appropriate to use the
reboot command.
The bosboot command requires that the boot logical volume (hd5) exists. If you ever
need to recreate the BLV from scratch, maybe it had been deleted by mistake or the
LVCB of hd5 has been damaged, the following steps should be followed:
1. Boot your machine in maintenance mode (from CD or tape (numeric 5) or use
(numeric 1) to access the Systems Management Services (SMS) to select boot
device).
2. Remove the old hd5 logical volume.
# rmlv hd5
3. Clear the boot record at the beginning of the disk.
# chpv -c hdisk0
4. Create a new hd5 logical volume: one physical partition in size, must be in
rootvg and outer edge as intrapolicy. Specify boot as logical volume type.
# mklv -y hd5 -t boot -a e rootvg 1
5. Run the bosboot command as described on the visual.
# bosboot -ad /dev/hdisk0
6. Check the actual bootlist.
# bootlist -m normal -o
7. Write data immediately to disk.
# sync
# sync
8. Reboot the system.
# reboot
By using the internal command ipl_varyon -i, you can check the state of the boot
record.
© Copyright IBM Corp. 2009, 2012 Unit 5. System initialization: Accessing a boot image 5-49
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Checkpoint (1 of 2)
IBM Power Systems
1. True or false: You must have AIX loaded on your system to use
the System Management Services programs.
Notes:
Checkpoint solutions (1 of 2)
IBM Power Systems
1. True or false: You must have AIX loaded on your system to use
the System Management Services programs.
The answer is false: SMS is part of the built-in firmware.
Additional information —
Transition statement — Let’s continue with more Checkpoint questions.
© Copyright IBM Corp. 2009, 2012 Unit 5. System initialization: Accessing a boot image 5-51
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Checkpoint (2 of 2)
IBM Power Systems
4. What command is used to build a new boot image and write it to the
boot logical volume?
6. True or false: During the AIX boot process, the AIX kernel is loaded
from the root file system.
Notes:
Checkpoint solutions (2 of 2)
IBM Power Systems
4. What command is used to build a new boot image and write it to the
boot logical volume?
The answer is bosboot -ad /dev/hdiskx.
6. True or false: During the AIX boot process, the AIX kernel is loaded
from the root file system.
The answer is false: the AIX kernel is loaded from hd5.
Additional information —
Transition statement — Now, let’s do an exercise.
© Copyright IBM Corp. 2009, 2012 Unit 5. System initialization: Accessing a boot image 5-53
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 5. System initialization: Accessing a boot image 5-55
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Unit summary
IBM Power Systems
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 5. System initialization: Accessing a boot image 5-57
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Estimated time
01:25
References
Online AIX Version 7.1 Operating system and device
management
Note: References listed as “online” above are available through the
IBM Systems Information Center at the following address:
http://publib.boulder.ibm.com/eserver
© Copyright IBM Corp. 2009, 2012 Unit 6. System initialization: rc.boot and inittab 6-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Unit objectives
IBM Power Systems
Notes:
There are many reasons for boot failures. The hardware might be damaged or, due to
user errors, the operating system might not be able to complete the boot process.
A good knowledge of the AIX boot process is a prerequisite for all AIX system
administrators.
For an ILO (Instructor Lead On-line) class: You should play file AN152U06F02 in the
multimedia library of Elluminate in place of visuals 6-2 to 6-7. You can then continue your
lecture normally to reinforce the topics.
- a.To access the multimedia library click on the CD button along the toolbar in
Elluminate.
- b.Once the multimedia library window is open, select AN152U06F02 and click on
play.
- c.Ask the students to indicate via green checkmark when they have finished the file.
For a FTF (Face to Face) class: You should play file AN152U06F02 on your instructor PC
in place of visuals 6-2 to 6-7.
Note that you can also use this activity as a review for the information covered in
visuals 6-2 to 6-7 as well.
© Copyright IBM Corp. 2009, 2012 Unit 6. System initialization: rc.boot and inittab 6-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
© Copyright IBM Corp. 2009, 2012 Unit 6. System initialization: rc.boot and inittab 6-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
/
Restore RAM file system from
boot image etc dev mnt usr
rc.boot 2
Activate rootvg
Configure remaining
Start "real" init process rc.boot 3
devices
(from rootvg)
/etc/inittab
© Copyright IBM Corporation 2012
Notes:
Boot sequence
The visual shows the boot sequence after loading the AIX kernel from the boot image.
The AIX kernel gets control and executes the following steps:
1. The kernel restores a RAM file system into memory by using information
provided in the boot image. At this stage the rootvg is not available, so the
kernel needs to work with commands provided in the RAM file system. You can
consider this RAM file system as a small AIX operating system.
2. The kernel starts the init process which was provided in the RAM file system
(not from the root file system). This init process executes a boot script
rc.boot.
3. rc.boot controls the boot process. In the first phase (it is called by init with
rc.boot 1), the base devices are configured. In the second phase (rc.boot 2),
the rootvg is activated (or varied on).
Uempty 4. After activating the rootvg at the end of rc.boot 2, the kernel overmounts the
RAM file system with the file systems from rootvg. The init from the boot
image is replaced by the init from the root file system, hd4.
5. This init processes the /etc/inittab file. Out of this file, rc.boot is called a third
time (rc.boot 3) and all remaining devices are configured.
© Copyright IBM Corp. 2009, 2012 Unit 6. System initialization: rc.boot and inittab 6-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Introduce the AIX software boot process. Keep this on the overview level.
Details — Explain as described in the student notes.
Additional information — Underline that at the beginning of the boot process, no rootvg
is available. Before activating the rootvg, all devices that are needed to varyon the rootvg
must be configured.
Transition statement — Let’s look what rc.boot is doing.
Uempty
rc.boot 1
IBM Power Systems
Failure LED
Process 1 rootvg is not active
F05 init
c06
rc.boot 1
Boot image
ODM
restbase
548 510
RAM file system
s ODM
cfgmgr -f i g_Rule
f
Con se=
1
pha
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 6. System initialization: rc.boot and inittab 6-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
executes all methods that are stored under phase=1. Phase 1 configuration
methods result in the configuration of base devices into the system, so that the
rootvg can be activated in the next rc.boot phase.
3. Base devices are all devices that are necessary to access the rootvg. If the
rootvg is stored on a hdisk0, all devices from the motherboard to the disk itself
must be configured in order to be able to access the rootvg.
4. At the end of rc.boot 1, the system determines the last boot device (used to
establish the /dev/ipldevice link) by calling bootinfo -b. The LED shows 511
(DEV CFG 1 END), followed by 553 (PHASE 1 COMPLETE).
© Copyright IBM Corp. 2009, 2012 Unit 6. System initialization: rc.boot and inittab 6-11
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
rc.boot 2 (1 of 2)
IBM Power Systems
Failure LED rc.boot 2
551
fsck -f /dev/hd9var
517 mount /var /
518 copycore RAM File system
umount /var
556
swapon /dev/hd6
Notes:
Uempty This improves the boot performance. If the check fails, LED 555 (FSCK ERROR) is
shown.
3. Afterwards, /dev/hd4 is mounted directly onto the root (/) in the RAM file system.
If the mount fails, for example due to a corrupted JFS log, the LED 557 (ROOT
MNT FAILED) is shown and the boot process stops.
4. Next, /dev/hd2 is checked and mounted (again with option -f, it is checked only
if the file system wasn't unmounted cleanly). If the mount fails, LED 518 (/USR
MOUNT FAILED) is displayed and the boot stops.
5. Then, the /var file system is checked and mounted. This is necessary at this
stage, because the copycore command checks if a dump occurred. If a dump
exists in a paging space device, it will be copied from the dump device,
/dev/hd6, to the copy directory which is by default the directory /var/adm/ras.
/var is unmounted afterwards. If the /var mount fails, LED 518 (/VAR MOUNT
FAILED) is displayed and the boot stops.
6. The primary paging space /dev/hd6 is made available.
© Copyright IBM Corp. 2009, 2012 Unit 6. System initialization: rc.boot and inittab 6-13
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Describe the first part of rc.boot 2.
Details — Introduce this boot phase as described in the student material. There are two
categories of status codes: SHOWLED codes and loopled codes.
The SHOWLED code are inside the graphic boxes and denote the progress of the script.
Seeing these code indicates that the specified operation is hung.
The loopled codes are shown outside of the graphic boxes and denote specific failures that
will stop the boot process.
The text in capitals and in parenthesis in the student notes is part of the LED message that
you would see on the server physical LED display, if the server was in the manufacturing
default configuration with only a single operating systems. With a logically partitioned
system, the HMC will only show the numeric code.
Additional information — Beginning with AIX 5L V5.1, the rootvg file system is mounted
directly over the root directory in the RAMFS. This simplifies several steps during phase 2
and eliminates the need to remount the rootvg file systems at the end of phase 2.
In many reference documents, LED 518 is defined as indicating that the /usr file system
could not mount using the network. This is incorrect. LED 518 will display anytime /usr
cannot be mounted.
Transition statement — Let’s describe the second part of rc.boot 2.
Uempty
rc.boot 2 (2 of 2)
IBM Power Systems
mount /var
dev etc mnt usr var
ODM
Copy boot messages to
alog /
RAM file system
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 6. System initialization: rc.boot and inittab 6-15
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Final stage
At this stage, the AIX kernel removes the RAM file system (returns the memory to the
free memory pool) and starts the init process from the root (/) file system in rootvg.
© Copyright IBM Corp. 2009, 2012 Unit 6. System initialization: rc.boot and inittab 6-17
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
rc.boot 3 (1 of 2)
IBM Power Systems
Process 1 /etc/inittab:
init /sbin/rc.boot 3 553
fsck -f /dev/hd3
Here, we work with mount /tmp 517 518
rootvg
savebase hd5:
ODM
Notes:
Uempty 3. The configuration manager is called again. If the key switch or boot mode is
normal, the cfgmgr is called with option -p2 (phase 2). If the key switch or boot
mode is service, the cfgmgr is called with option -p3 (phase 3).
4. The configuration manager reads the ODM class Config_Rules and executes
either all methods for phase=2 or phase=3. All remaining devices that are not
base devices are configured in this step.
5. The console will be configured by cfgcon. The numbers c31, c32, c33 or c34
are displayed depending on the type of console:
- c31: Console not yet configured. Provides instruction to select a console.
- c32: Console is a lft (graphic display) terminal.
- c33: Console is a tty.
- c34: Console is a file on the disk.
If CDE is specified in /etc/inittab, the CDE will be started and you get a graphical
boot on the console.
6. To synchronize the ODM in the boot logical volume with the ODM from the root
(/) file system, savebase is called.
© Copyright IBM Corp. 2009, 2012 Unit 6. System initialization: rc.boot and inittab 6-19
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Describe the first part of rc.boot 3.
Details — Describe as explained in the student notes.
Additional information — The savebase command is necessary to synchronize the
ODMs from hd4 (rootvg ODM repository) and hd5 (reduced ODM in the BLV).
Transition statement — Let’s describe the second part of rc.boot 3.
Uempty
rc.boot 3 (2 of 2)
IBM Power Systems
/etc/objrepos:
savebase ODM
syncd 60
errdemon
hd5:
Turn off LEDs ODM
rm /etc/nologin
A device that was previously detected
s
Ye could not be found. Run "diag -a".
chgstatus=3
in CuDv ? System initialization is completed.
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 6. System initialization: rc.boot and inittab 6-21
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
© Copyright IBM Corp. 2009, 2012 Unit 6. System initialization: rc.boot and inittab 6-23
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
rc.boot summary
IBM Power Systems
Executed Phase
Command Primary Actions
From Config_Rules
RAM restbase
rc.boot 1 file system 1
cfgmgr -f
(/dev/ram0)
ipl_varyon
RAM Mount /, /usr, /var file systems
rc.boot 2 file system
mergedev
(/dev/ram0)
Copy ODM files
mount /tmp
cfgmgr -p2 2=normal
rc.boot 3 rootvg or
cfgmgr -p3 3=service
savebase
Notes:
Summary
During rc.boot 1, all base devices are configured. This is done by cfgmgr -f which
executes all phase 1 methods from Config_Rules.
During rc.boot 2, the rootvg is varied on. All /dev files and the customized ODM files
from the RAM file system are merged to disk.
During rc.boot 3, all remaining devices are configured by cfgmgr -p. The
configuration manager reads the Config_Rules class and executes the corresponding
methods. To synchronize the ODMs, savebase is called that writes the ODM from the
disk back to the boot logical volume.
© Copyright IBM Corp. 2009, 2012 Unit 6. System initialization: rc.boot and inittab 6-25
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 6. System initialization: rc.boot and inittab 6-27
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Explain how to fix a corrupted file system.
Details — Point out that a common cause of this type of corruption is the use of the HMC
shutdown immediate option for an LPAR with a running operating system. This is the
equivalent of cutting power to a computer while the operating system is running, which
does not allow for a proper shutdown. An administrator should always use (when possible)
the HMC OS shutdown option or issue the shutdown command from the LPAR command
prompt.
Additional information —
Transition statement — Let’s review the phases of rc.boot.
Uempty
(1)
rc.boot 1
(2)
(4)
(3)
(5)
Notes:
Instructions
Using the following questions, put the solutions into the visual.
1. What calls rc.boot 1? Is it:
• /etc/init from hd4
• /etc/init from the RAMFS in the boot image
2. Which command copies the ODM files from the boot image into the RAM file
system?
3. Which command triggers the execution of all phase 1 methods in Config_Rules?
4. Which ODM files contain the devices that have been configured in rc.boot 1?
• ODM files in hd4
• ODM files in RAM file system
5. How can you determine the last boot device?
© Copyright IBM Corp. 2009, 2012 Unit 6. System initialization: rc.boot and inittab 6-29
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Review and test the students understanding of rc.boot phase 1.
Details — This is the first of three reviews. You can review each one separately, or have
the students do all three, then review them all.
(1)
(2)
restbase
(4)
(3)
ODM files in RAM cfgmgr -f
file system
(5)
bootinfo -b
Additional information —
Transition statement — Now, let’s review rc.boot phase 2.
Uempty
(5)
rc.boot 2
(1) (6)
(2) (7)
(3)
(8)
557
(4)
Notes:
Instructions
Please order the following eight expressions in the correct sequence.
1. Turn on paging
2. Merge RAM /dev files.
3. Copy boot messages to alog
4. Activate rootvg
5. Mount /var; copy dump; unmount /var
6. Mount /dev/hd4 onto / in RAMFS
7. Copy RAM ODM files
8. Finally, answer the following question. Put the answer in box 8:
Your system stops booting with an LED 557. Which command failed?
© Copyright IBM Corp. 2009, 2012 Unit 6. System initialization: rc.boot and inittab 6-31
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Review and test the students, understanding of rc.boot phase 2.
Details — This is the second of three reviews. You can review each one separately, or
have the students do all three, then review them all.
(5)
rc.boot 2 Merge RAM /dev files
(1) (6)
Activate rootvg Copy RAM ODM files
(4)
Turn on
paging
Additional information — Question 8 is important for the lab. The command that failed is
the mount of /dev/hd4. One reason for this might be a damaged log logical volume.
Transition statement — Now, let’s review rc.boot phase 3.
Uempty
sy____ ___
/sbin/rc.boot 3 err_______
rm _________
s_______ ________&
Missing devices ?
_________=3
________ -p2 in ______ ?
________ -p3
Notes:
Instructions
Please complete the missing information in the picture.
Your instructor will review the activity with you.
© Copyright IBM Corp. 2009, 2012 Unit 6. System initialization: rc.boot and inittab 6-33
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Review and test the students understanding of rc.boot phase 3.
Details — This is the last of three reviews. You can review each one separately, or have
the students do all three, then review them all.
savebase
/etc/inittab
syncd 60
/sbin/rc.boot3 errdemon
rm /etc/nologin
syncvg rootvg &
chgstatus=3
cfgmgr -p2 in CuDv ?
cfgmgr -p3
Additional information —
Transition statement — Now, let’s switch over to the next topic.
© Copyright IBM Corp. 2009, 2012 Unit 6. System initialization: rc.boot and inittab 6-35
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Configuration manager
IBM Power Systems
Predefined
PdDv
PdAt
PdCn
cfgmgr Config_Rules
Customized Methods
CuDv Define
CuAt Device
Configure
Driver load
CuDep Change
CuDvDr
Unconfigure
unload
CuVPD Undefine
© Copyright IBM Corporation 2012
Notes:
Automatic configuration
Many devices are automatically detected by the configuration manager. For this to
occur, device entries must exist in the predefined device object classes. The
configuration manager uses the methods from PdDv to manage the device state, for
example, to bring a device into the defined or available state.
Define method
When a device is defined through its define method, the information from the predefined
database for that type of device is used to create the information describing the device
specific instance. This device specific information is then stored in the customized
database.
Configuration order
The configuration process requires that a device be defined or configured before a
device attached to it can be defined or configured. At system boot time, the
configuration manager configures the system in a hierarchical fashion. First the
motherboard is configured, then the buses, then the adapters that are attached, and
finally the devices that are connected to the adapters. The configuration manager then
configures any pseudodevices (volume groups, logical volumes, and so forth) that need
to be configured.
© Copyright IBM Corp. 2009, 2012 Unit 6. System initialization: rc.boot and inittab 6-37
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Summarize how cfgmgr works.
Details — Explain that cfgmgr can detect devices automatically. The devices must be
defined in the predefined ODM classes. When they get defined, they are stored in the
customized ODM classes.
The cfgmgr is method or rule driven. It just uses methods to define or configure a device.
These methods are device specific and are listed in PdDv.
During the boot process, cfgmgr uses the Config_Rules class to configure the devices in
the correct sequence.
Note that the actual Config_Rules object class has more objects in each phase than are
listed in the visual.
Additional information — The output from the configuration manager is viewable in the
boot alog. During run-time, cfgmgr can be started with the flag -v, to get more information
about the devices that are configured.
Transition statement — Let’s have a look in the Config_Rules ODM class.
Uempty
1 10 0 /etc/methods/defsys
1 12 0 /usr/lib/methods/deflvm
cfgmgr -f
2 10 0 /etc/methods/defsys
2 12 0 /usr/lib/methods/deflvm cfgmgr -p2
2 19 0 /etc/methods/ptynode (Normal boot)
2 20 0 /etc/methods/startlft
3 10 0 /etc/methods/defsys
3 12 0 /usr/lib/methods/deflvm
3 19 0 /etc/methods/ptynode cfgmgr -p3
3 20 0 /etc/methods/startlft (Service boot)
3 25 0 /etc/methods/starttty
Notes:
Introduction
The Config_Rules ODM object class is used by cfgmgr during the boot process. The
phase attribute determines when the respective method is called.
Phase 1
All methods with phase=1 are executed when cfgmgr -f is called. The first method that
is started is /etc/methods/defsys, which is responsible for the configuration of all
base devices. The second method /usr/lib/methods/deflvm loads the logical volume
device driver (LVDD) into the AIX kernel.
If you have devices that must be configured in rc.boot 1, that means before the
rootvg is active, you need to place phase 1 configuration methods into Config_Rules.
A bosboot is required afterwards.
© Copyright IBM Corp. 2009, 2012 Unit 6. System initialization: rc.boot and inittab 6-39
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Phase 2
All methods with phase=2 are executed when cfgmgr -p2 is called. This takes place in
the third rc.boot phase, when the key switch is in normal position or for a normal boot
on a PCI machine. The seq attribute controls the sequence of the execution: The lower
the value, the higher the priority.
Phase 3
All methods with phase=3 are executed when cfgmgr -p3 is called. This takes place in
the third rc.boot phase, when the key switch is in service position, or a service boot
has been issued on a PCI system.
Sequence number
Each configuration method has an associated sequence number. When executing the
methods for a particular phase, cfgmgr sorts the methods based on the sequence
number. The methods are then invoked, one by one, starting with the smallest
sequence number. Methods with a sequence number of zero are invoked last, after
those with non-zero sequence numbers.
Boot mask
Each configuration method has an associated boot mask:
- If the boot_mask is zero, the rule applies to all types of boot.
- If the boot_mask is non-zero, the rule then only applies to the boot type specified.
For example, if boot_mask = DISK_BOOT, the rule would only be used for boots from
disk versus NETWORK_BOOT which only applies when booting through the network.
© Copyright IBM Corp. 2009, 2012 Unit 6. System initialization: rc.boot and inittab 6-41
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
# alog -t boot -o
-------------------------------------------------------
attempting to configure device 'sys0'
invoking /usr/lib/methods/cfgsys_rspc -l sys0
return code = 0
******* stdout *******
bus0
******* no stderr *****
-------------------------------------------------------
attempting to configure device 'bus0'
invoking /usr/lib/methods/cfgbus_pci bus0
return code = 0
******** stdout *******
bus1, scsi0
****** no stderr ******
-------------------------------------------------------
attempting to configure device 'bus1'
invoking /usr/lib/methods/cfgbus_isa bus1
return code = 0
******** stdout ******
fda0, ppa0, sa0, sioka0, kbd0
****** no stderr *****
Figure 6-15. cfgmgr output in the boot log using alog , AN152.2
Notes:
If you have boot problems, it is always a good idea to check the boot alog file for
potential boot error messages. All output from cfgmgr is shown in the boot log, as well
as other information that is produced in the rc.boot script.
The default boot log file size in AIX 5L V5.1 (8 KB) was too small to capture the entire
output of a system boot in AIX 5L. The default boot log size in AIX 5L V5.2 is 32 KB and
in AIX 5L V5.3 (and later) it is 128 KB. If you want to increase the size of the boot log,
for example to 256 KB, issue the following command:
# print “Resizing boot log” | alog -C -t boot -s 262144
© Copyright IBM Corp. 2009, 2012 Unit 6. System initialization: rc.boot and inittab 6-43
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Describe the alog command to identify boot messages.
Details — Describe how boot messages produced during the boot process are written to
an alog file. Show how the alog command can be used. The bootlog shows more than
the output of the cfgmgr -v during rc.boot execution. The various rc.boot steps which
we have covered have messages written to the boot log.
Additional information — Describe how the boot log, /var/adm/ras/bootlog, might be
increased to a bigger size. This often had to be done prior to AIX 5L V5.2 as the default
size of 8 KB was very small. To display the size of the log run: alog -t boot -L
The alog is circular; meaning the oldest information will be automatically overwritten by the
newest information.
Transition statement — Let’s review the /etc/inittab file.
Uempty
/etc/inittab file
IBM Power Systems
init:2:initdefault:
brc::sysinit:/sbin/rc.boot 3 >/dev/console 2>&1 # Phase 3 of system boot
powerfail::powerfail:/etc/rc.powerfail 2>&1 | alog -tboot > /dev/console #
mkatmpvc:2:once:/usr/sbin/mkatmpvc >/dev/console 2>&1
atmsvcd:2:once:/usr/sbin/atmsvcd >/dev/console 2>&1
tunables:23456789:wait:/usr/sbin/tunrestore -R > /dev/console 2>&1 # Set tunab
securityboot:2:bootwait:/etc/rc.security.boot > /dev/console 2>&1
rc:23456789:wait:/etc/rc 2>&1 | alog -tboot > /dev/console # Multi-User checks
rcemgr:23456789:once:/usr/sbin/emgr -B > /dev/null 2>&1
fbcheck:23456789:wait:/usr/sbin/fbcheck 2>&1 | alog -tboot > /dev/console # ru
srcmstr:23456789:respawn:/usr/sbin/srcmstr # System Resource Controller
rctcpip:23456789:wait:/etc/rc.tcpip > /dev/console 2>&1 # Start TCP/IP daemons
mkcifs_fs:2:wait:/etc/mkcifs_fs > /dev/console 2>&1
sniinst:2:wait:/var/adm/sni/sniprei > /dev/console 2>&1
rcnfs:23456789:wait:/etc/rc.nfs > /dev/console 2>&1 # Start NFS Daemons
cron:23456789:respawn:/usr/sbin/cron
piobe:2:wait:/usr/lib/lpd/pioinit_cp >/dev/null 2>&1 # pb cleanup
cons:0123456789:respawn:/usr/sbin/getty /dev/console
qdaemon:23456789:wait:/usr/bin/startsrc -sqdaemon
writesrv:23456789:wait:/usr/bin/startsrc -swritesrv
uprintfd:23456789:respawn:/usr/sbin/uprintfd
shdaemon:2:off:/usr/sbin/shdaemon >/dev/console 2>&1 # High availability
Notes:
Purpose of /etc/inittab
The /etc/inittab file supplies information for the init process. Note how the rc.boot
script is executed out of the inittab file to configure all remaining devices in the boot
process.
Modifying /etc/inittab
Do not use an editor to change the /etc/inittab file. One small mistake in /etc/inittab,
and your machine will not boot. Instead use the commands mkitab, chitab, and
rmitab to edit /etc/inittab. The advantage of these commands is that they always
guarantee a non-corrupted /etc/inittab file. If your machine stops booting with an LED
553, this indicates a bad /etc/inittab file in most cases.
© Copyright IBM Corp. 2009, 2012 Unit 6. System initialization: rc.boot and inittab 6-45
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Viewing /etc/inittab
The lsitab command can be used to view the /etc/inittab file. For example:
# lsitab dt
dt:2:wait:/etc/rc.dt
If you issue lsitab -a, the complete /etc/inittab file is shown.
© Copyright IBM Corp. 2009, 2012 Unit 6. System initialization: rc.boot and inittab 6-47
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Describe the /etc/inittab file and some important commands to view and
manipulate this file.
Details — Show that rc.boot is executed out of /etc/inittab. Describe that it is risky to edit
the /etc/inittab file. It is always better to use the commands described in the notes.
Additional information — Point out that a corrupted /etc/inittab file is indicated by LED
553. The students will see this in their exercise.
The mkitab, chitab, and rmitab commands provide automatic syntax checking. The line
must match the proper format for /etc/inittab.
There is a -i option with mkitab to insert the new line anywhere in the /etc/inittab file.
Without the -i, the line will be appended to the end of the file.
Transition statement — Let’s describe the basics for system hang detection.
Uempty
Notes:
Introduction
The visual shows some common boot errors that might happen during the AIX software
boot process.
Bootlist wrong?
If the bootlist is wrong, the system cannot boot. This is easy to fix. Boot in SMS and
select the correct boot device. Keep in mind that only hard disks with boot records are
shown as selectable boot devices.
© Copyright IBM Corp. 2009, 2012 Unit 6. System initialization: rc.boot and inittab 6-49
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
mode and check both files. Consider using a mksysb to retrieve these files from a
backup tape.
Superblock corrupt?
Another thing you can try is to check the superblocks of your rootvg file systems. If you
boot in maintenance mode and you get error messages like Not an AIX file system
or Not a recognized file system type, it is probably due to a corrupt superblock in
the file system.
Each file system has two super blocks. Executing fsck should automatically recover
the primary superblock by copying from the backup superblock. The following is
provided in case you need to do this manually.
For JFS, the primary superblock is in logical block 1 and a copy is in logical block 31. To
manually copy the superblock from block 31 to block 1 for the root file system (in this
example), issue the following command:
# dd count=1 bs=4k skip=31 seek=1 if=/dev/hd4 of=/dev/hd4
For JFS2, the locations are different. To manually recover the primary superblock from
the backup superblock for the root file system (in this example), issue the following
command:
# dd count=1 bs=4k skip=15 seek=8 if=/dev/hd4 of=/dev/hd4
© Copyright IBM Corp. 2009, 2012 Unit 6. System initialization: rc.boot and inittab 6-51
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Describe some common causes of boot problems.
Details — Describe as explained in the student notes. Describe the meaning of 553 and
557 as they are part of the exercise.
Additional information —
Transition statement — Let’s review the /etc/inittab file which was described in the basic
administration course.
Uempty
init:2:initdefault:
brc::sysinit:/sbin/rc.boot 3
rc:2:wait:/etc/rc
fbcheck:2:wait:/usr/sbin/fbcheck
srcmstr:2:respawn:/usr/sbin/srcmstr
cron:2:respawn:/usr/sbin/cron
rctcpip:2:wait:/etc/rc.tcpip
rcnfs:2:wait::/etc/rc.nfs
qdaemon:2:wait:/usr/bin/startsrc -sqdaemon
dt:2:wait:/etc/rc.dt
tty0:2:off:/usr/sbin/getty /dev/tty1
myid:2:once:/usr/local/bin/errlog.check
Notes:
Instructions
Answer the following questions as they relate to the /etc/inittab file shown in the visual:
1. Which process is started by the init process only one time?
The init process does not wait for the initialization of this process.
4. Which line determines that multiuser mode is the initial run level of the system?
© Copyright IBM Corp. 2009, 2012 Unit 6. System initialization: rc.boot and inittab 6-53
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
11. Which line takes care of varying on the volume groups, activating paging spaces,
and mounting file systems that are to be activated during boot?
Additional information —
1. The myid line is started only one time
The action once indicates the init process to start the process and not to wait for its
initialization. When the process ends, it will not be restarted.
2. The qdaemon line
The qdaemon controls the queueing subsystem in AIX. It manages jobs in queues and
their assignment to the different queues in the system.
© Copyright IBM Corp. 2009, 2012 Unit 6. System initialization: rc.boot and inittab 6-55
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
© Copyright IBM Corp. 2009, 2012 Unit 6. System initialization: rc.boot and inittab 6-57
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Checkpoint (1 of 2)
IBM Power Systems
Notes:
Checkpoint solutions (1 of 2)
IBM Power Systems
Additional information —
Transition statement —
© Copyright IBM Corp. 2009, 2012 Unit 6. System initialization: rc.boot and inittab 6-59
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Checkpoint (2 of 2)
IBM Power Systems
5. What is the likely cause if your system stops booting with LED 553?
Notes:
Checkpoint solutions (2 of 2)
IBM Power Systems
5. What is the likely cause if your system stops booting with LED 553?
The answer is there is a problem with processing /etc/inittab.
Additional information —
Transition statement — Now, let’s do an exercise.
© Copyright IBM Corp. 2009, 2012 Unit 6. System initialization: rc.boot and inittab 6-61
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 6. System initialization: rc.boot and inittab 6-63
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Unit summary
IBM Power Systems
Notes:
Highlights
- After the boot image is loaded into RAM, the rc.boot script is executed three times
to configure the system.
- During rc.boot 1, devices to varyon the rootvg are configured.
- During rc.boot 2, the rootvg is varied on.
- In rc.boot 3, the remaining devices are configured.
- Processes defined in the /etc/inittab file are initiated by the init process.
© Copyright IBM Corp. 2009, 2012 Unit 6. System initialization: rc.boot and inittab 6-65
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Estimated time
01:25
References
Online AIX Version 7.1 Command Reference volumes 1-6
Online AIX Version 7.1 Operating system and device
management
Note: References listed as “online” above are available through the
IBM Systems Information Center at the following address:
http://publib.boulder.ibm.com/eserver
© Copyright IBM Corp. 2009, 2012 Unit 7. LVM metadata and related problems 7-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Unit objectives
IBM Power Systems
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 7. LVM metadata and related problems 7-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
© Copyright IBM Corp. 2009, 2012 Unit 7. LVM metadata and related problems 7-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Physical Logical
Partitions Partitions
Physical Logical
Volumes Volume
Volume
Group
Notes:
Introduction
This visual and the associated student notes will provide a review of basic LVM terms.
Uempty value for scalable volume groups (introduced in AIX 5L V5.3) will be the lowest value
that can be used to accommodate 2040 physical partitions per physical volume.
For scalable volume groups, the maximum number of physical partitions is no longer
defined on a per disk basis but applies to the entire volume group. The scalable volume
group can hold up to 2097152 (2048 K) physical partitions.
© Copyright IBM Corp. 2009, 2012 Unit 7. LVM metadata and related problems 7-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Introduce some basic LVM terms.
Details — Use the student notes to guide your presentation.
Additional information — If no physical partition size is specified when creating the
volume group, the mkvg command attempts to figure out an appropriate physical partition
size based on the disks in the volume group.
Transition statement — Let’s look at the unique identifiers used by LVM for the volume
groups, logical volumes, and physical volumes.
Uempty
LVM identifiers
IBM Power Systems
Notes:
Use of identifiers
The LVM uses identifiers for disks, volume groups, and logical volumes. As volume
groups could be exported and imported between systems, these identifiers must be
unique worldwide.
AIX generated identifiers are based on the CPU ID of the creating host and a
timestamp.
© Copyright IBM Corp. 2009, 2012 Unit 7. LVM metadata and related problems 7-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Disk identifiers
Disk identifiers have a length of 32 bytes, but currently the last 16 bytes are unused and
are all set to 0 in the ODM. Notice that, as shown on the visual, only the first 16 bytes of
this identifier are displayed in the output of the lspv command.
In a SAN environment, path management needs to have a method for identifying a disk
discovered over two different paths is actually the same disk. Some storage solutions,
in an AIX environment use the PVID for this purpose. Other storage solutions use a
IEEE volume identifier (ieee_volname) or a UDID unique identifier (unique_id) for this
purpose. Each of these would be attributes of the disk in the ODM.
The PVID attribute is set the first time a disk is assigned to a volume group.
If you ever have to manually update the disk identifiers in the ODM, do not forget to add
16 zeros to the physical volume ID.
© Copyright IBM Corp. 2009, 2012 Unit 7. LVM metadata and related problems 7-11
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Uempty instead is kept in the same reserved disk area as the VGDA. Also, the administrator
of a big volume group can use the -T O option of the mklv command to request that
the LVCB not be stored in the beginning of the logical volume, but instead part of the
VGDA.
LVCB-related considerations
For normal volume groups, the LVCB resides in the first block of the user data within the
logical volume. Big volume groups keep additional LVCB information in the VGDA. The
LVCB structure on the first logical volume user block and the LVCB structure within the
VGDA are similar but not identical. If a big volume group was created with the -T O
option of the mkvg command, no LVCB will occupy the first block of the logical volume.
With scalable volume groups, logical volume control information is no longer stored on
the first user block of any logical volume. Therefore, no precautions have to be taken
when using raw logical volumes, because there is no longer a need to preserve the
information held by the first 512 bytes of the logical device.
© Copyright IBM Corp. 2009, 2012 Unit 7. LVM metadata and related problems 7-13
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Introduce the disk control blocks.
Details — Explain using the information in the student notes.
Additional information — None
Transition statement — Let’s see which other locations are used to store LVM data.
Uempty
AIX files
/etc/vg/vgVGID Handle to the VGDA copy in memory
/dev/hdiskX Special file for a disk
/dev/VGname Special file for administrative access to a volume
group
/dev/LVname Special file for a logical volume
/etc/filesystems Used by the mount command to associate
logical volume name, file system log, and
mount point
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 7. LVM metadata and related problems 7-15
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Describe where LVM data is stored.
Details — Explain using the information in the student notes. Keep this on an overview
level.
Additional information — None
Transition statement — Let's look (at a high level) at what ODM classes hold LVM
metadata.
Uempty
Notes:
Overview
The LVM metadata is maintained in the ODM database has a large overlap with the
information maintained in the VGDA and LVCB control blocks. Yet, there is information
in the control blocks (such as the mapping of logical partitions) that is not kept in the
ODM, and there is information (such as device drivers and logical names) that is not
kept in the control blocks. Each metadata location plays a special role. For the
information they have in common, there are mechanisms to ensure that they do not
conflict.
© Copyright IBM Corp. 2009, 2012 Unit 7. LVM metadata and related problems 7-17
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Provide an overview of the ODM object classes that hold LVM metadata
information.
Details —
Additional information —
Transition statement — Let’s look at how the importvg and exportvg commands relate
to these two LVM metadata locations.
© Copyright IBM Corp. 2009, 2012 Unit 7. LVM metadata and related problems 7-19
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
moon
hdisk9
To export a volume group:
lv10
lv1 1. Unmount all file systems
loglv1 from the volume group:
01
# umount /dev/lv10
# umount /dev/lv11
Notes:
The scenario
The exportvg and importvg commands can be used to fix ODM problems. These
commands also provide a way to transfer data between different AIX systems. This
visual provides an example of how to export a volume group.
The disk, hdisk9, is connected to the system moon. This disk belongs to the myvg
volume group. This volume group needs to be transferred to another system.
Uempty 2. When all logical volumes are closed, use the varyoffvg command to vary off the
volume group.
3. Finally, export the volume group, using the exportvg command. After this point,
the complete volume group (including all file systems and logical volumes) is
removed from the ODM.
After exporting the volume group, the disks in the volume group can be
transferred to another system.
© Copyright IBM Corp. 2009, 2012 Unit 7. LVM metadata and related problems 7-21
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Explain how to export a volume group.
Details —
Additional information —
Transition statement — Let’s describe how to import a volume group.
Uempty
myvg
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 7. LVM metadata and related problems 7-23
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
In AIX V4.3 and subsequent releases, the volume group is automatically varied
on.
3. Finally, mount the file systems.
© Copyright IBM Corp. 2009, 2012 Unit 7. LVM metadata and related problems 7-25
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 7. LVM metadata and related problems 7-27
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Explain the problems and solutions related to duplicate names when using
importvg.
Details —
Additional information —
Transition statement — What happens if logical volumes already exist during the
importvg?
Uempty
mars
lv10
lv11
loglv0
1
hdisk3
myvg
lv10
# importvg -y myvg hdisk3
lv11
loglv importvg: changing LV name lv10 to fslv00
01 importvg: changing LV name lv11 to fslv01
hdisk2
datavg
importvg can also accept the PVID in place of the hdisk name
© Copyright IBM Corporation 2012
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 7. LVM metadata and related problems 7-29
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Explain what happens if a logical volume already exists on a system during the
import.
Details —
Additional information —
Transition statement — Let’s describe what happens if a file system already exists during
an import.
Uempty
# umount /home/michael
# mount -o log=/dev/loglv01 /dev/lv24 /home/michael
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 7. LVM metadata and related problems 7-31
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
If the file system type is jfs2, you have to specify this as well
(-V jfs2). You can get this information by running the command
getlvcb lv24 -At
Another method is to add a new stanza to the /etc/filesystems file. This is covered in
the next visual.
© Copyright IBM Corp. 2009, 2012 Unit 7. LVM metadata and related problems 7-33
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Uempty - account specifies whether the file system should be processed by the accounting
system. A value of false indicates no accounting.
Before mounting the file system /home/michael_moon, the corresponding mount point
must be created.
© Copyright IBM Corp. 2009, 2012 Unit 7. LVM metadata and related problems 7-35
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Describe how to add a stanza to /etc/filesystems.
Details — This might be a good place to stop and have the students execute the first two
parts of the matching lab exercise, focusing using exportvg and importvg. The other
option is to do all of the exercises at the end of the unit.
Additional information — To discover the information contained in the newly imported
volume group, use the standard LVM tools:
To see the logical volume names:
# lsvg -l myvg
To see details of the logical volumes:
# lslv lvname
These commands will assist in creating the new stanza in /etc/filesystems.
Transition statement — Let’s next look at the details of the metadata which is stored in
the VGDA, LCB, and the ODM object classes.
© Copyright IBM Corp. 2009, 2012 Unit 7. LVM metadata and related problems 7-37
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Introduction
The table in the visual shows the contents of the VGDA. The individual items listed are
discussed in the paragraphs that follow.
Time stamps
The time stamps are used to check if a VGDA is valid. If the system crashes while
changing the VGDA, the time stamps will differ. The next time the volume group is
varied on, this VGDA is marked as invalid. The latest intact VGDA will then be used to
overwrite the other VGDAs in the volume group.
© Copyright IBM Corp. 2009, 2012 Unit 7. LVM metadata and related problems 7-39
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Describe the contents of the VGDA.
Details — Use the student notes to guide your explanation. The students do not need to
know the detailed structure of the VGDA, this is just to reinforce the concepts of the type of
information maintained in the VGDA, and that the time stamps help identify a VGDA copy
that is out of date.
Additional information — The -d flag of the mkvg command is ignored in AIX 5L V5.2,
AIX 5L V5.3, and AIX 6.1.
Transition statement — Let’s have a look into the VGDA.
Uempty
VGDA example
IBM Power Systems
5: ____________
Logical:
00c35ba000004c00000001157fcf6bdf.1 lv00 1
00c35ba000004c00000001157fcf6bdf.2 lv01 1
00c35ba000004c00000001157fcf6bdf.3 lv02 1
Physical: 00c35ba07fcf6b93 2 0
6: ____________ 7: ____________
© Copyright IBM Corporation 2012
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 7. LVM metadata and related problems 7-41
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
© Copyright IBM Corp. 2009, 2012 Unit 7. LVM metadata and related problems 7-43
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Examine a VGDA.
Details — Implement this page as a sort of activity. Give the students 10 minutes to order
the expressions. Then review the page:
1. PP size = 220 (2 to the 20th power) bytes, or 1 MB (for this volume group)
2. 3 logical volumes in volume group
3. 1 physical volume in volume group
4. 2 VGDAs in volume group
5. LVIDs (VGID.minor_number)
6. PVIDs
7. VGDA count on this disk
Additional information — The lqueryvg command displays the PP size as the value of
the exponent in the power of 2 expression specifying the number of bytes in a PP. In the
example on the visual, the value of 20 given for PP Size means that the PP size is 220
bytes, which is the same as saying 1 MB. In the AIX 7.1 example in the student notes, the
value of 24 shown for PP Size means that the PP size is 224 bytes, which is 16 MB.
The best resource for information about intermediate-level LVM commands such as
lqueryvg, lvm_query, and getlvcb is the IBM Redbook AIX Logical Volume Manager from
A to Z: Troubleshooting and Commands (SG24-5433-00).
The output of lqueryvg might vary a bit, depending on the version of AIX. The notes
include an example of output from this command from an AIX 7.1 system.
Another command that can be used to examine the VGDA is readvgda or readvgda_svg if
you want to read the VGDA for a scalable volume group.
You might mention that this VGDA seems to belong to a scalable volume group (1024 MAX
PVs) and not a normal volume group (MAX LVs: 256, MAX PVs: 32) or a big volume group
(MAX LVs: 512, etc.)
Transition statement — Let’s look at the LVCB.
Uempty
Notes:
The logical volume control block (LVCB) and the getlvcb command
The LVCB stores attributes of a logical volume. The getlvcb command queries an
LVCB.
Example report:
# getlvcb -AT hd2
AIX LVCB
intrapolicy = c
copies = 1
interpolicy = m
lvid = 00c35ba000004c00000001157f54bf78.5
lvname = hd2
label = /usr
machine id = 35BA04C00
© Copyright IBM Corp. 2009, 2012 Unit 7. LVM metadata and related problems 7-45
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
For an ILO (Instructor Lead On-line) class: You should play file AN152U07F16 in the
multimedia library of Elluminate in place of the next visual. You can then continue your
lecture normally to reinforce the topics if desired.
- a.To access the multimedia library click on the CD button along the toolbar in
Elluminate.
- b.Once the multimedia library window is open, select AN152U07F16 and click on
play.
- c.Ask the students to indicate via green checkmark when they have finished the file.
For a FTF (Face to Face) class: You should play file AN152U07F16 on your instructor PC
in place of the next visual.
Note that you can also use this activity as a review for the information covered in the
next visual as well.
© Copyright IBM Corp. 2009, 2012 Unit 7. LVM metadata and related problems 7-47
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
importvg
ODM
VGDA and
LVCB
Change, using Match IDs by /etc/filesystems
low-level name
commands
mkvg
extendvg
mklv Update
crfs exportvg
chfs
rmlv
reducevg
...
© Copyright IBM Corporation 2012
Figure 7-16. How LVM interacts with the ODM and the VGDA AN152.2
Notes:
High-level commands
Most of the LVM commands that are used when working with volume groups, physical,
or logical volumes are high-level commands. These high-level commands (like mkvg,
extendvg, mklv, and others listed on the visual) are implemented as shell scripts and
use names to reference a certain LVM object. The ODM is consulted to match a name,
for example, rootvg or hdisk0, to an identifier.
Uempty end up in a situation where the VGDA/LVCB and the ODM are not in sync. The same
situation may occur when low-level commands are used incorrectly.
© Copyright IBM Corp. 2009, 2012 Unit 7. LVM metadata and related problems 7-49
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Explain how LVM interacts with ODM and VGDA/LVCB.
Details — Use the student notes to guide your explanation.
Additional information — The commands exportvg/importvg are covered later in this
course. Therefore, just mention briefly what these commands do.
Transition statement — Let’s see how the LVM-related device ODM objects look. This is
important, because you will have to repair ODM entries in the next part of the exercise we
started earlier. We will start with the entries that store information about physical volumes.
Uempty
CuDv:
name = "hdisk0"
status = 1
chgstatus = 2
ddins = "scsidisk"
location = ""
parent = "vscsi0"
connwhere = "810000000000"
PdDvLn = "disk/vscsi/vdisk"
CuDv:
name = "hdisk2"
status = 1
chgstatus = 0
ddins = "scdisk"
location = "01-08-01-8,0"
parent = "scsi1"
connwhere = "8,0"
PdDvLn = "disk/scsi/scsd"
Notes:
Example report:
# odmget -q "name like hdisk[02]" CuDv
CuDv:
name = "hdisk0"
status = 1
chgstatus = 2
ddins = "scsidisk"
location = ""
parent = "vscsi0"
connwhere = "810000000000"
© Copyright IBM Corp. 2009, 2012 Unit 7. LVM metadata and related problems 7-51
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
PdDvLn = "disk/vscsi/vdisk"
CuDv:
name = "hdisk2"
status = 1
chgstatus = 0
ddins = "scdisk"
location = "01-08-01-8,0"
parent = "scsi1"
connwhere = "8,0"
PdDvLn = "disk/scsi/scsd"
Key attributes
Remember the most important attributes:
- status = 1 means the disk is available
- chgstatus = 2 means the status has not changed since last reboot
- location specifies the location code of the device
- parent specifies the parent device
© Copyright IBM Corp. 2009, 2012 Unit 7. LVM metadata and related problems 7-53
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
CuAt:
name = "hdisk1"
attribute = "unique_id"
value = "3321360050768019102C0F000000000006E2A04214503IBMfcp"
type = "R"
generic = "D"
rep = "nl"
nls_index = 79
CuAt:
name = "hdisk1"
attribute = "pvid"
value = "00f606036452e56a0000000000000000"
type = "R"
generic = "D"
rep = "s"
nls_index = 2
Notes:
CuAt:
name = "hdisk1"
attribute = "unique_id"
value = "3321360050768019102C0F000000000006E2A04214503IBMfcp"
type = "R"
generic = "D"
rep = "nl"
nls_index = 79
CuAt:
name = "hdisk1"
attribute = "pvid"
value = "00f606036452e56a0000000000000000"
type = "R"
generic = "D"
rep = "s"
nls_index = 2
© Copyright IBM Corp. 2009, 2012 Unit 7. LVM metadata and related problems 7-55
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Explain that the PVID is stored in CuAt.
Details — Use the student notes to guide your explanation.
Additional information — A previous version of the course specified to use the
chdev command with pv=yes, in order to create a missing physical volume. If a physical
volume in a volume group is missing, the actual recommended method of recovery is to:
1. Varyoff the volume group with varyoffvg
2. Export the volume group with exportvg
3. Remove the disk with rmdev the disk
4. Run cfgmgr
5. Import the volume group with importvg
Transition statement — Let’s look at the ODM information for a Fire Channel attached
LUN.
Uempty
CuDv:
name = "hdisk1"
status = 1
chgstatus = 2
ddins = "scsidisk"
location = "31-T1-01"
parent = "fscsi0"
connwhere = "W_0"
PdDvLn = "disk/fcp/mpioosdisk"
# lscfg -l hdisk1
hdisk1 U8233.E8B.100603P-V16-C31-T1-W500507680140581E-
L1000000000000 MPIO IBM 2145 FC Disk
© Copyright IBM Corporation 2012
Notes:
Discussion:
For Fibre Channel accessed LUNs, the location field would identify the parent FC
adapter; the connwhere would have a place holder value of W_0, which indicates that the
disk identify is stored in the ww_name attribute of the disk.
The physical location code is comprised of the location code of the parent adapter,
followed by the ww_name and the LUN ID (obtained from the lun_id attribute of the
disk).
Example reports:
# odmget -q "name=hdisk1" CuDv
CuDv:
name = "hdisk1"
© Copyright IBM Corp. 2009, 2012 Unit 7. LVM metadata and related problems 7-57
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
status = 1
chgstatus = 2
ddins = "scsidisk"
location = "31-T1-01"
parent = "fscsi0"
connwhere = "W_0"
PdDvLn = "disk/fcp/mpioosdisk“
# lscfg -l hdisk1
hdisk1
U8233.E8B.100603P-V16-C31-T1-W500507680140581E-L1000000000000 MPIO
IBM 2145 FC Disk
© Copyright IBM Corp. 2009, 2012 Unit 7. LVM metadata and related problems 7-59
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
CuDvDr:
resource = "devno"
value1 = "36"
value2 = "0"
value3 = "hdisk3"
# ls -l /dev/hdisk[03]
brw------- 1 root system 17, 0 Oct 08 06:17 /dev/hdisk0
brw------- 1 root system 36, 0 Oct 08 09:19 /dev/hdisk3
Notes:
Special files
Applications or system programs use the special files to access a certain device. For
example, the visual shows special files used to access hdisk0 (/dev/hdisk0) and
hdisk1 (/dev/hdisk1).
© Copyright IBM Corp. 2009, 2012 Unit 7. LVM metadata and related problems 7-61
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
VGID
One of the most important pieces of information about a volume group is the VGID. As
shown on the visual, this information is stored in CuAt.
© Copyright IBM Corp. 2009, 2012 Unit 7. LVM metadata and related problems 7-63
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
CuAt:
name = "rootvg"
attribute = "timestamp"
value = "470a1bc9243ed693"
type = "R"
generic = "DU"
rep = "s"
nls_index = 0
CuAt:
name = "rootvg"
attribute = "pv"
value = "00c35ba07b2e24f00000000000000000"
type = "R"
generic = ""
rep = "sl"
nls_index = 0
Notes:
Length of PVID
Remember that the PVID is a 32-number field, where the last 16 numbers are set to
zeros.
© Copyright IBM Corp. 2009, 2012 Unit 7. LVM metadata and related problems 7-65
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 7. LVM metadata and related problems 7-67
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
# ls -l /dev/hd2
brw------- 1 root system 10,5 08 Jan 06:56 /dev/hd2
Notes:
CuDvDr logical volume objects
Each logical volume has an object in CuDvDr that is used to create the special file entry
for that logical volume in /dev. As an example, the sample output on the visual shows
the CuDvDr object for hd2 and the corresponding /dev/hd2 (major number 10, minor
number 5) special file entry in the /dev directory.
© Copyright IBM Corp. 2009, 2012 Unit 7. LVM metadata and related problems 7-69
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
© Copyright IBM Corp. 2009, 2012 Unit 7. LVM metadata and related problems 7-71
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
2.
Notes:
Causes of problems
The signal handlers used by high-level LVM commands do not work with a kill -9, a
system shutdown, or a system crash. You might end up in a situation where the VGDA
has been updated, but the change has not been stored in the ODM.
Problems might also occur because of the improper use of low-level commands or
hardware changes that are not followed by correct administrator actions.
Uempty Another common problem is ODM corruption when performing LVM operations when
the root file system (which contains /etc/objrepos) is full. Always check the root file
system free space before attempting LVM recovery operations.
© Copyright IBM Corp. 2009, 2012 Unit 7. LVM metadata and related problems 7-73
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Explain how ODM-related problems might come up.
Details — Explain the student material.
Additional information — None
Transition statement — Let’s identify ways that ODM problems can be fixed.
Uempty
# varyoffvg homevg
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 7. LVM metadata and related problems 7-75
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
3. In the last step, you import the volume group by using the importvg command.
Specify the volume group name with option -y, otherwise AIX creates a new volume
group name.
You need to specify only one intact physical volume of the volume group that you
import. The importvg command reads the VGDA and LVCB on that disk and
creates completely new ODM objects.
It should be noted that this procedure does not allow the data to be used while repairing
the corruption, even if the file systems are mounted and are accessible despite the
problem. The logical volumes must be closed to vary the volume group offline.
© Copyright IBM Corp. 2009, 2012 Unit 7. LVM metadata and related problems 7-77
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
If the ODM problem is in the rootvg, try using the rvgrecover procedure:
PV=hdisk0
VG=rootvg
cp /etc/objrepos/CuAt /etc/objrepos/CuAt.$$
cp /etc/objrepos/CuDep /etc/objrepos/CuDep.$$
cp /etc/objrepos/CuDv /etc/objrepos/CuDv.$$
cp /etc/objrepos/CuDvDr /etc/objrepos/CuDvDr.$$
Notes:
Problems in rootvg
For ODM problems in rootvg, finding a solution is more difficult because rootvg cannot
be varied off or exported. However, it may be possible to fix the problem using one of
the techniques described below.
Uempty After deleting all ODM objects from rootvg, it imports the rootvg by reading the VGDA
and LVCB from the boot disk. This results in completely new ODM objects that describe
your rootvg.
© Copyright IBM Corp. 2009, 2012 Unit 7. LVM metadata and related problems 7-79
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
© Copyright IBM Corp. 2009, 2012 Unit 7. LVM metadata and related problems 7-81
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
• synclvodm <vgname>
Synchronizes the VGDA, LVCB, ODM, and special device files
Volume group must be active
First run the redefinevg command if ODM does not have the
minimum required information about the volume group
© Copyright IBM Corporation 2012
Notes:
Overview
There are situations where you are unable to run the exportvg or importvg commands
because they depend on finding a minimal level of information in the ODM. Even if
these high level LVM commands can be run, they require that the volume group be
taken offline, which would be disruptive. In these situations it is useful to know some
intermediate level LVM commands. These commands are primarily intended to be used
by high level ODM commands, but they can be useful in solving tough problems.
Uempty be active for the resynchronization to occur. If logical volume names are specified, only
the information related to those logical volumes is updated.
The synclvodm command, by itself, can do a fairly complete job of resynchronizing the
ODM with the LVM data areas on the disk. It will also synchronize the information
between the LVM data areas. As such, it can worsen a situation where only one disk in
the volume group has corrupted data areas. The command can be restricted to
synchronizing only specific logical volumes. Otherwise, it synchronizes all logical
volumes. The synclvodm command depends upon a minimal amount of information in
the ODM; most importantly, the ODM needs to know the volume group name plus the
physical volume and logical volume memberships.
© Copyright IBM Corp. 2009, 2012 Unit 7. LVM metadata and related problems 7-83
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Explain the use of LVM intermediate level commands.
Details — Note that there is an optional part of the exercise where they explore the use of
these intermediate level commands.
Additional information —
Transition statement — Checkpoint.
Uempty
Checkpoint
IBM Power Systems
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 7. LVM metadata and related problems 7-85
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Discuss the checkpoint questions.
Details — A “Checkpoint Solution” is given below:
Checkpoint solutions
IBM Power Systems
Uempty
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 7. LVM metadata and related problems 7-87
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Transition to the lab.
Details — Explain the goals of this part of the exercise. If the students executed the first
two parts of the lab after the first two unit topics, then they would now continue with part
three of the exercise.
Additional information — None
Transition statement — Let’s finish with a brief summary of what we have discussed in
this unit.
Uempty
Unit summary
IBM Power Systems
Notes:
Discussion:
The LVM information is held in a number of different places on the disk, including the
ODM and the VGDA.
ODM-related problems can be solved by:
• exportvg and importvg (non-rootvg volume groups)
• rvgrecover (rootvg)
• LVM intermediate commands
• Manually fixing using ODM commands.
© Copyright IBM Corp. 2009, 2012 Unit 7. LVM metadata and related problems 7-89
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Summarize key points from the unit.
Details — Present the highlights from the unit.
Before continuing to the next unit stop and ask the students if there are any additional
questions before continuing.
Additional information — None
Transition statement — Let’s continue with the next unit.
Estimated time
00:25 Topic 1
00:50 Topic 2
References
Online AIX Version 7.1 Command Reference volumes 1-6
Online AIX Version 7.1 Operating system and device
management
Note: References listed as “online” above are available through the
IBM Systems Information Center at the following address:
http://publib.boulder.ibm.com/eserver
GG24-4484 AIX Storage Management (Redbook)
© Copyright IBM Corp. 2009, 2012 Unit 8. Disk management procedures 8-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Uempty
Unit objectives
IBM Power Systems
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 8. Disk management procedures 8-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Present the objectives of this unit.
Details —
Additional information —
Transition statement — Let's start with a discussion of mirroring and quorum issues.
© Copyright IBM Corp. 2009, 2012 Unit 8. Disk management procedures 8-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Mirroring
IBM Power Systems
hdisk1
Notes:
Role of VGSA
The information about the mirrored partitions is stored in the VGSA, which is contained
on each disk. In the example shown on the visual, we see that logical partition 5 points
to physical partition 5 on hdisk0, physical partition 8 on hdisk1, and physical partition 9
on hdisk2.
© Copyright IBM Corp. 2009, 2012 Unit 8. Disk management procedures 8-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Stale partitions
IBM Power Systems
hdisk0
Mirrored
hdisk1 logical
volume
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 8. Disk management procedures 8-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Explain what stale partitions are.
Details — Explain using the information in the student notes. The prerequisite course
discusses stale partitions as a stage in the creation of mirroring. Remind them that this can
also happen as a result of disk failure. Once a disk is recovered, the syncvg command will
need to be run to resynchronize the copies.
Additional information — Explain that using varyonvg is better than using syncvg
directly. The varyonvg command works if the volume group is already varied on and if the
volume group is the rootvg.
Transition statement — Let’s see how mirrored logical volumes can be created.
Uempty
Mirroring rootvg
IBM Power Systems
hdisk0 hdisk1
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 8. Disk management procedures 8-11
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Check that the disk is actually bootable, since it will hold the alternate boot logical
volume.
# bootinfo -B hdisk1
Any returned value other than a value of 1, indicates the disk is not bootable.
b. If not already part of the rootvg, add the new disk to the volume group (for example,
hdisk1):
# extendvg [ -f ] rootvg hdisk1
c. Use the mirrorvg command to mirror all of the logical volumes in the rootvg to the
new disk. The mirrorvg command, by default, will disable quorum and mirror the
existing logical volumes in the specified volume group. Changes to the volume
group quorum attribute is effective immediately without having to vary off and then
vary on the volume group. It will, by default, also synchronize the copies; though,
you may suppress synchronization by using the -s flag. It is recommended that you
use the exact mapping option (-m) to ensure that the mirror copy of the boot logical
volume (hd5) is allocated contiguous physical partitions. To mirror rootvg, use the
command:
# mirrorvg -m rootvg hdisk1
Restrictions:
• You cannot use the mirrorvg command on a snapshot volume group
• You cannot use the mirrorvg command on a volume group that has an active
firmware assisted dump logical volume
• You cannot use the mirrorvg command if ALL of the following conditions exist:
- The target system is a logical partition (LPAR)
- A copy of the boot logical volume (by default, hd5) resides on the failed
physical volume
- The replacement physical volume's adapter was dynamically configured into
the LPAR since the last cold boot
An alternative to running mirrorvg is to separately execute the component tasks:
• If you use one mirror disk, be sure that a quorum is not required for vary on:
# chvg -Qn rootvg
• Add the mirrors for all rootvg logical volumes:
# mklvcopy hd1 2 hdisk1
# mklvcopy hd2 2 hdisk1
# mklvcopy hd3 2 hdisk1
# mklvcopy hd4 2 hdisk1
# mklvcopy hd5 2 hdisk1
# mklvcopy hd6 2 hdisk1
© Copyright IBM Corp. 2009, 2012 Unit 8. Disk management procedures 8-13
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Review how to mirror rootvg.
Details — Again, this is review from the prerequisite course. Use the information in the
student material to guide your presentation.
Additional information — Mirroring of the paging space and dump logical volumes:
When mirroring rootvg, hd6 should be mirrored because the paging space availability is
critical to keeping the system online. hd6 serves both as paging space and as the default
dump device. In AIX V 4.3.3 and subsequent releases, there is no problem with mirroring
dump devices.
In releases prior to 4.3.3, dump devices did not work correctly if mirrored. On these older
releases, a separate dump device should be created and not mirrored.
Before 4.3.3, if the dump device was mirrored, when the dump occurred, the data would be
written to one copy of the mirror. Even though only one copy was updated, no partitions
would be marked stale. When the machine rebooted, the dump data would attempt to move
the data from hd6 and write it to /var/adm/ras (by default). Since LVM would think the
mirror was in sync, it would read the data from all copies of hd6 causing the dump to
become corrupted. In AIX4.3.3, the intermediate command (readlvcopy), was provided
that allowed one to specify to only read from the primary copy even though the policy was
parallel. Dump processing (snap reading of dump logical volume) was re-coded to use
readlvcopy. At that point mirroring of a dump logical volume could be supported.
But there was another problem. Sometimes with a mirrored dump logical volume, the dump
would not be reported. This was fixed in AIX 5.2 TL08 (or later) and in AIX 5.3 TL04 (or
later).
Thus a mirrored dump logical volume is currently supported and the mirrorvg command
automatically mirrors the paging space, even when it is also acting as the dump logical
volume.
On the other hand, mirroring the dump logical volume is not recommended, due to the
resulting performance impact when creating the dump and some surmountable but
irritating complications in reading the dump. Because of this recommendation, the
mirrorvg command will not mirror the dump logical volume if the there is a separate logical
volume for the dump (not using the paging space).
In order to protect against the scenario of the disk holding the dump logical volume being
unavailable (when not mirrored) at the time of the dump, the recommendation is that you
should define a secondary dump device on a different disk than the primary dump device.
It is good to have a separate logical volume for the dump, instead of using the paging
space logical volume. By having a separate dump logical volume, it separates the dump
logical volume issues from the paging space issues. For example: it is definitely desirable
to mirror the paging space logical volume, while it is recommended that you do not mirror
the dump logical volume.
Uempty Regarding mirroring during mksysb backups and restores, the official advisory
recommendation is that the mirroring of the rootvg be broken. Be certain that the
remaining disk is the same disk as the last disk used to boot (bootinfo -b).
Transition statement — Let’s show another way to mirror the rootvg.
© Copyright IBM Corp. 2009, 2012 Unit 8. Disk management procedures 8-15
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
VGDA count
IBM Power Systems
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 8. Disk management procedures 8-17
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
datavg
hdisk1 hdisk2
VG
ctive ac
n ot a tiv
e
VG
# varyonvg datavg Closed during operation:
No more access to logical volumes
FAILS LVM_SA_QUORCLOSE in error log
Notes:
Introduction
What happens if quorum checking is enabled for a volume group and a quorum is not
available?
Consider the following example (illustrated on the visual and discussed in the following
paragraphs): In a two-disk volume group datavg, the disk hdisk1 is not available due to
a hardware defect. hdisk1 is the disk that contains the two VGDAs; that means the
volume group does not have a quorum of VGDAs.
© Copyright IBM Corp. 2009, 2012 Unit 8. Disk management procedures 8-19
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Describe the quorum mechanism.
Details — Describe what happens when the quorum is not available. Make sure they
understand the difference between quorum checking of an active volume group and the
quorum mechanisms involved with trying to vary on an inactive volume group.
Additional information — Some of this discussion applies to rootvg. However, there are
some differences, as we will see later.
Transition statement — Let’s describe how to set up non-quorum volume groups.
Uempty
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 8. Disk management procedures 8-21
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
© Copyright IBM Corp. 2009, 2012 Unit 8. Disk management procedures 8-23
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
datavg
ed"
"remov hdisk1 hdisk2
# varyonvg -f datavg
Failure accessing hdisk1. Set PV STATE to removed.
Volume group datavg is varied on.
Notes:
Quorum checking on
With quorum checking on, you always need > 50% of the VGDAs available (except to
vary on rootvg).
© Copyright IBM Corp. 2009, 2012 Unit 8. Disk management procedures 8-25
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Describe forced vary on of a volume group.
Details — Use the student notes to guide your explanation.
Additional information —
Transition statement — Let’s discuss what’s meant by physical volume state.
Uempty
missing missing
varyonvg -f VGName
Hardware
repair
removed
Hardware repair
followed by:
varyonvg VGName
chpv -v a hdiskX
removed
© Copyright IBM Corporation 2012
Notes:
Introduction
This page introduces physical volume states (not device states). Physical volume states
can be displayed with lsvg -p VGName.
Active state
If a disk can be accessed during a varyonvg, it gets a physical volume state of active.
Missing state
If a disk can not be accessed during a varyonvg, but quorum is available, the failing
disk gets a physical volume state missing. If the disk can be repaired, for example,
after a power failure, you just have to issue a varyonvg VGName to bring the disk into the
active state again. Any stale partitions will be synchronized.
© Copyright IBM Corp. 2009, 2012 Unit 8. Disk management procedures 8-27
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Removed state
If a disk cannot be accessed during a varyonvg and the quorum of disks is not
available, you can issue the command, varyonvg -f VGName, and force the vary on of
the volume group.
The failing disk gets a physical volume state of removed, and it will not be used for
quorum checks any longer.
© Copyright IBM Corp. 2009, 2012 Unit 8. Disk management procedures 8-29
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
© Copyright IBM Corp. 2009, 2012 Unit 8. Disk management procedures 8-31
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Yes
Disk mirrored? Procedure 1
No
Yes
Disk still working? Procedure 2
No
Volume group No
Procedure 3
lost?
Not rootvg
rootvg
Yes
Procedure 4 Procedure 5
© Copyright IBM Corporation 2012
Notes:
Flowchart
Before starting the disk replacement, always follow the flowchart that is shown in the
visual. This will help you whenever you have to replace a disk.
1. If the disk that must be replaced is completely mirrored onto another disk, follow
procedure 1
2. If a disk is not mirrored, but still works, follow procedure 2
Uempty 3. If you are absolutely sure that a disk failed and you are not able to repair the
disk, do the following:
- If the volume group can be varied on (normal or forced), use procedure 3
- If the volume group is totally lost after the disk failure, that means the volume
group could not be varied on (either normal or forced)
• If the volume group is rootvg, follow procedure 4
• If the volume group is not rootvg, follow procedure 5
© Copyright IBM Corp. 2009, 2012 Unit 8. Disk management procedures 8-33
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Provide considerations before a disk replacement.
Details — Explain as described in the student material.
Additional information — This flowchart is a method to offer disk replacement procedures
for many types of disk failures. It is not guaranteed that 100% of all disk failures are
covered.
A good way to distinguish between the various procedures is to focus on where we recover
the data from:
1. Procedure 1 - We synchronize from a remaining good mirror copy
2. Procedure 2 - We migrate the data off the suspect disk to the new disk before removing
the suspect disk
3. Procedure 3 - We recover the data from the file system backups (or logical volume
backup provided by the using application)
4. Procedure 4 - We recover using the mksysb backup of the rootvg
5. Procedure 5 - We recover using the savevg backup for the non-rootvg
Transition statement — Let’s start with procedure 1
Uempty
Notes:
Disk state
This procedure requires that the disk state of the failed disk be either missing or
removed. Refer to Physical Volume States (covered earlier in this unit) for more
information on disk states. Use the command, lspv hdiskX, to check the state of your
physical volume. If the disk is still in the active state, you cannot remove any copies or
logical volumes from the failing disk. In this case, one way to bring the disk into a
removed or missing state is to run the reducevg -d command or to do a varyoffvg
and a varyonvg on the volume group by rebooting the system.
© Copyright IBM Corp. 2009, 2012 Unit 8. Disk management procedures 8-35
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Alternative approaches
The two main alternatives for this procedure are to use the replacepv command or to
not use that command. The replacepv command greatly simplifies the procedure.
The restrictions are:
- The volume group can not be rootvg.
- The snapshot volume group mechanism must not be in use.
- The replacement physical volume must be at least as large as failed physical
volume.
- Both physical volumes can be on the system at the same time. In other words, you
cannot remove the failed disk and then place the new disk in the same position.
© Copyright IBM Corp. 2009, 2012 Unit 8. Disk management procedures 8-37
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
3. Run replacepv:
# replacepv hdiskX hdiskY
Notes:
The replacepv command greatly simplifies the procedure.
1) Provide a replacement disk. It may already be an unused disk, already known
to AIX. Otherwise, you need to provide a new disk. There are many ways to
provide a disk that is new to AIX:
• Directly allocate a PCI storage adapter to the LPAR. If the adapter does
not already have an available PCI under it, it will need to be provided
through a hot add (if a local disk) or by zoning a LUN (if it’s a Fibre
Channel adapter)
• Use PowerVM to provision a virtual SCSI disk.
2) Discover the new disk by executing the cfgmgr command.
3) Execute the replacepv to allocate physical partitions on the replacement disk
for the problem disk. Effectively the new disk replaces the failing disk in the
mirroring configuration. In the example, hdiskX is the failing disk.
4) Remove the failing disk.
© Copyright IBM Corp. 2009, 2012 Unit 8. Disk management procedures 8-39
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
The goal of each disk replacement is to remove all logical volumes from a disk.
1. Start removing all logical volume copies from the disk. Use either the SMIT
fastpath smit unmirrorvg or the unmirrorvg command as shown in the visual.
This will unmirror each logical volume that is mirrored on the disk.
If you have additional unmirrored logical volumes on the disk, you have to either
move them to another disk (migratepv), or remove them if the disk cannot be
accessed (rmlv).
2. If the disk is completely empty, remove the disk from the volume group. Use
SMIT fastpath smit reducevg or the reducevg command.
3. After the disk has been removed from the volume group, you can remove it from
the ODM. Use the rmdev command as shown in the visual.
4. Use a hot-swap procedure to replace the failed or failing disk. (In older
machines, disk replacement would effectively require the system to be shutdown
for the procedure). Execute cfgmgr to discover and configure the new disk.
Uempty 5. Add the new disk to the volume group. Use either the SMIT fastpath
smit extendvg or the extendvg command.
6. Finally, create new copies for each logical volume on the new disk. Use either
the SMIT fastpath smit mirrorvg or the mirrorvg command. If synchronization
was suppressed during mirroring, then remember to eventually synchronize the
volume group (or each logical volume), using the syncvg command.
© Copyright IBM Corp. 2009, 2012 Unit 8. Disk management procedures 8-41
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Explain Procedure 1, without use of the replacepv command.
Details —
Additional information — When you read the student notes you might think that removing
a logical volume from a disk that fails is not possible. The important thing is: it is possible,
but it requires the disk to be either in a missing or removed state. If the disk is active, the
LVM does not allow you to unmount a file system or remove a logical volume from the
failing disk.
Now the problem is: how do you bring a disk into the missing or removed state? The
answer is that you have to do a reducevg -d or to force a new varyonvg, either in a normal
or a forced mode. Because you cannot do a varyoffvg when file systems are mounted
(and you cannot unmount them from the failing disk), the only way to recover from this bad
situation is to reboot your system. This might cause other problems if the failing disk is in
rootvg and the quorum has not been disabled in a two-disk volume group.
Transition statement — Let’s describe additional considerations for when the volume
group is the rootvg.
Uempty
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 8. Disk management procedures 8-43
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Explain special considerations for rootvg situations.
Details —
Additional information —
Transition statement — Let’s describe procedure 2.
Uempty
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 8. Disk management procedures 8-45
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
3. Before executing the next step, it is necessary to distinguish between the rootvg
and a non-rootvg volume group.
- If the disk that is replaced is in rootvg, execute the steps that are shown on
the next visual Procedure 2: Special Steps for rootvg.
- If the disk that is replaced is not in the rootvg, use the migratepv command:
# migratepv hdisk_old hdisk_new
This command moves all logical volumes from one disk to another. You can
do this during normal system activity. The command migratepv requires that
the disks are in the same volume group.
4. If the old disk has been completely migrated, remove it from the volume group.
Use either the SMIT fastpath smit reducevg or the reducevg command.
5. If you need to remove the disk from the system, remove it from the ODM using
the rmdev command as shown. Finally, remove the physical disk from the
system.
© Copyright IBM Corp. 2009, 2012 Unit 8. Disk management procedures 8-47
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
rootvg 1
hdiskX 2
hdiskY
3. Disk contains hd5?
# migratepv -l hd5 hdiskX hdiskY
1. Connect new disk to system # bosboot -ad /dev/hdiskY
# chpv -c hdiskX
2. Add new disk to volume # bootlist -m normal hdiskY
group
Migrate old disk to new disk:
3.
# migratepv hdiskX hdiskY
4. Remove old disk from
volume group 4
Notes:
Uempty If the disk contains the boot logical volume, migrate the logical volume to the
new disk and update the boot logical volume on the new disk. To avoid a
potential boot from the old disk, clear the old boot record by using the
chpv -c command. Then, change your bootlist:
# migratepv -l hd5 hdiskX hdiskY
# bosboot -ad /dev/hdiskY
# chpv -c hdiskX
# bootlist -m normal hdiskY
If the disk contains the primary dump device, you must deactivate the dump
before migrating the corresponding logical volume:
# sysdumpdev -p /dev/sysdumpnull
- Migrate the complete old disk to the new one:
# migratepv hdiskX hdiskY
If the primary dump device has been deactivated, you have to activate it
again:
# sysdumpdev -p /dev/hdX
4. After the disk has been migrated, remove it from the rootvg volume group.
# reducevg rootvg hdiskX
5. If the disk must be removed from the system, remove it from the ODM (use the
rmdev command), shut down your AIX, and remove the disk from the system
afterwards.
# rmdev -l hdiskX -d
© Copyright IBM Corp. 2009, 2012 Unit 8. Disk management procedures 8-49
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Describe the special considerations for rootvg.
Details — Describe as provided in the student material.
Additional information —
Transition statement — Let’s describe procedure 3.
Uempty
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 8. Disk management procedures 8-51
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Procedure steps
If the failing disk is in a missing or removed state, start the procedure:
1. Identify all logical volumes and file systems on the failing disk. Use commands
like lspv, lslv or lsfs to provide this information. These commands will work on
a failing disk.
2. If you have mounted file systems on logical volumes on the failing disk, you must
unmount them. Use the umount command.
3. Remove all file systems from the failing disk using smit rmfs or the rmfs
command. If you remove a file system, the corresponding logical volume and
stanza in /etc/filesystems is removed as well.
4. Remove the remaining logical volumes (those not associated with a file system)
from the failing disk using smit rmlv or the rmlv command.
5. Remove the disk from the volume group, using the reducevg command or the
SMIT fastpath smit reducevg.
6. Remove the disk from the ODM and from the system using the rmdev command.
7. Add the new disk to the system and extend your volume group. Use the SMIT
fastpath smit extendvg or the extendvg command.
8. Recreate all logical volumes and file systems that have been removed due to the
disk failure. Use smit mklv, smit crfs or the commands directly.
9. Due to the total disk failure, you lost all data on the disk. This data has to be
restored, either by the restore command or any other tool you use to restore
data (for example, Tivoli Storage Manager) from a previous backup.
© Copyright IBM Corp. 2009, 2012 Unit 8. Disk management procedures 8-53
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
rootvg
3. Restore from a mksysb image
hdiskX hdiskY
4. Import each volume group into the new
ODM (importvg) if needed
Contains OS
datavg logical
volumes
hdiskZ
mksysb
Notes:
Procedure steps
Follow these steps:
1. Replace the bad disk
2. Boot your system in maintenance mode
3. Restore your system from a mksysb
Uempty If any rootvg file systems were not mounted when the mksysb was made, those
file systems are not included on the backup image. You will need to create and
restore those as a separate step.
4. If your mksysb does not contain user volume group definitions (for example, you
created a volume group after saving your rootvg), you have to import the user
volume group after restoring the mksysb. For example:
# importvg -y datavg hdisk9
Only one disk from the volume group (in our example hdisk9), needs to be
selected.
Export and import of volume groups is discussed in more detail in the next topic.
© Copyright IBM Corp. 2009, 2012 Unit 8. Disk management procedures 8-55
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Explain how to recover a total rootvg failure.
Details — Describe as explained in the student notes.
Additional information —
Transition statement — Let’s describe procedure 5.
Uempty
datavg
1. Export the volume group from the system:
# exportvg vg_name
Notes:
Procedure steps
Follow these steps:
1. To fix this problem, export the volume group from the system. Use the command
exportvg as shown. During the export of the volume group, all ODM objects that
are related to the volume group will be deleted.
2. Check your /etc/filesystems. There should be no references to logical volumes
or file systems from the exported volume group.
© Copyright IBM Corp. 2009, 2012 Unit 8. Disk management procedures 8-57
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
3. Remove the bad disk from the ODM (use rmdev as shown). Shut down your
system and remove the physical disk from the system.
4. Connect the new drive and boot the system. The cfgmgr will configure the new
disk.
5. If you have a volume group backup available (created by the savevg command),
you can restore the complete volume group with the restvg command (or the
SMIT fastpath smit restvg). All logical volumes and file systems are recovered.
If you have more than one disk that should be used during restvg, you must
specify these disks:
# restvg -f /dev/rmt0 hdiskY hdiskZ
The savevg and restvg commands will be discussed in a future chapter.
6. If you have no volume group backup available, you have to recreate everything
that was part of the volume group.
Recreate the volume group (mkvg or smit mkvg), all logical volumes (mklv or
smit mklv) and all file systems (crfs or smit crfs).
7. Finally, restore the lost data from backups, for example with the restore
command or any other tool you use to restore data in your environment.
© Copyright IBM Corp. 2009, 2012 Unit 8. Disk management procedures 8-59
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
# lsvg -p datavg
ODM failure unable to find device id
...734... in device
configuration database
ODM problem No
Export and import
in rootvg?
volume group
Yes
rvgrecover
Notes:
ODM failure
After an incorrect disk replacement, you might detect ODM failures. For example, when
issuing the command lsvg -p datavg, a typical error message could be:
unable to find device id 00837734 in device configuration database
In this case, a device could not be found in the ODM.
© Copyright IBM Corp. 2009, 2012 Unit 8. Disk management procedures 8-61
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Provide how to fix ODM failures (this is a kind of a review page).
Details —
Additional information —
Transition statement — Let us look at an example of a procedural error that would cause
an ODM problem.
Uempty
Notes:
The problem
A frequent error occurs when the administrator removes a disk from the ODM (by
executing rmdev) and physically removes the disk from the system, without first
executing the reducevg command to remove volume group references to that disk (in
the VGDA and in the ODM).
The VGDA stores information about all physical volumes of the volume group. ODM
disk references include the physical volume attributes for the volume group.
Note: Throughout this discussion the physical volume ID (PVID) is abbreviated in the
visuals for simplicity. The physical volume ID is actually 32 characters.
The result of this mistake is that the volume group can not be varied online. Attempts to
use reducevg after the fact, fail - since the command requires that the volume group be
active.
© Copyright IBM Corp. 2009, 2012 Unit 8. Disk management procedures 8-63
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Introduce the VGDA corruption if a disk is removed from the ODM but not from
the volume group.
Details — Describe as explained in the student notes.
Additional information — It is not possible to remove a disk from the ODM as long as it
has open logical volumes. If any process is using a logical volume from a disk, you cannot
remove the disk with rmdev.
Transition statement — Let’s describe the fix for this error.
Uempty
Notes:
The fix
Before fixing the problem, be sure you have correctly recorded the PVID for the
removed disk. The previous lsvg listing of physical volumes for datavg would have
provided that. A previously executed lspv would also have provided the PVID.
This problem can be fixed by executing the reducevg command, but the volume group
need to be active and the varyonvg will not work while volume group has a PVID
value can not be resolved to a disk.
You could use odmdelete to remove the bad PVID attribute object, but this is not as
simple as it sounds and a mistake could make matter worse. An easier way to clean up
the bad ODM reference is exporting the volume group and then importing the volume
group using the VGDA on the remaining disk.
Once the volume group is active, we can then use reducevg to properly remove the
bad PVID reference from the VGDA. Instead of specifying the disk name, the PVID of
© Copyright IBM Corp. 2009, 2012 Unit 8. Disk management procedures 8-65
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
the removed disk is specified. If you did not earlier record the PVID, then you will need
to obtain it from the VGDA itself.
To obtain the PVID of the removed disk from the VGDA run:
# lqueryvg -p hdisk4 -At (Use any disk from the volume group)
You need to compare this with the lsvg -p datavg output to identify which PVID is for
the missing disk.
© Copyright IBM Corp. 2009, 2012 Unit 8. Disk management procedures 8-67
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Checkpoint
IBM Power Systems
Notes:
Checkpoint solutions
IBM Power Systems
2. This volume group consists of two disks that are completely mirrored.
Because of the disk failure you are not able to vary on datavg. How do
you recover from this situation?
The answer is forced varyon: varyonvg -f datavg. Use procedure
1 for mirrored disks.
3. After disk replacement, you find that a disk has been removed from the
system but not from the volume group. How do you fix this?
The answer is repair the ODM, for example through exportvg and
importvg. Execute reducevg using the PVID instead of disk name.
Additional information —
Transition statement — Now, let’s do an exercise.
© Copyright IBM Corp. 2009, 2012 Unit 8. Disk management procedures 8-69
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 8. Disk management procedures 8-71
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Unit summary
IBM Power Systems
Notes:
Discussion:
Different procedures are available that can be used to fix disk problems under any
circumstance:
Procedure 1: Mirrored disk
Procedure 2: Disk still working (rootvg specials)
Procedure 3: Total disk failure
Procedure 4: Total rootvg failure
Procedure 5: Total non-rootvg failure
exportvg and importvg can be used to easily transfer volume groups between
systems.
© Copyright IBM Corp. 2009, 2012 Unit 8. Disk management procedures 8-73
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Estimated time
00:25 Topic 1
00:24 Topic 2
Reference
Online AIX Version 7.1 Command Reference volumes 1-6
Online AIX Version 7.1 Operating system and device
management
Online AIX Version 7.1 Installation and migration
Note: References listed as “online” above are available through the
IBM Systems Information Center at the following address:
http://publib.boulder.ibm.com/eserver
© Copyright IBM Corp. 2009, 2012 Unit 9. Install and cloning techniques 9-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Uempty
Unit objectives
IBM Power Systems
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 9. Install and cloning techniques 9-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — List this unit’s objectives.
Details —
Additional information — In the previous unit, students learned when volume group
backups must be used after a disk failure. This unit will explain how to back up rootvg and
non-rootvg volume groups.
Transition statement — Let’s start with the mksysb command.
© Copyright IBM Corp. 2009, 2012 Unit 9. Install and cloning techniques 9-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Topic 1 objectives
IBM Power Systems
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 9. Install and cloning techniques 9-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
# smit alt_install
Notes:
Uempty Filesets
An alternate disk installation uses the following filesets:
- bos.alt_disk_install.boot_images must be installed for alternate disk mksysb
installations
- bos.alt_disk_install.rte must be installed for rootvg cloning and alternate disk
mksysb installations
© Copyright IBM Corp. 2009, 2012 Unit 9. Install and cloning techniques 9-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Introduce alternate disk installation.
Details — Alternate disk installation has been available since AIX V4.3.
Additional information —
alt_disk_install
New Commands
Command Arguments
-C args disk alt_disk_copy args -d disks
-d mksysb args disks alt_disk_mksysb -m mksysb args -d disks
-W args disk alt_rootvg_op -W args -d disk
-S args alt_rootvg_op -S args
-P2 args disks alt_rootvg_op -C args -d disks
-X args alt_rootvg_op -X args
-v args disk alt_rootvg_op -v args -d disk
-q args disk alt_rootvg_op -q args -d disk
Uempty
hdisk0
rootvg (AIX 5L V5.3)
hdisk1
AIX 6.1
Notes:
Introduction
An alternate mksysb installation involves installing a mksysb image that has already
been created from another system onto an alternate disk of the target system. The
mksysb image must have been created on a system running AIX V4.3 or subsequent
versions of the operating system.
Example
In the example, an AIX V6.1 mksysb tape image is installed on an alternate disk, hdisk1
by executing the following command:
# alt_disk_mksysb -m /dev/rmt0 -d hdisk1
The system now contains two rootvgs on different disks. In the example, one rootvg
has an AIX 5L V5.3 (hdisk0), one has an AIX 6.1 (hdisk1).
© Copyright IBM Corp. 2009, 2012 Unit 9. Install and cloning techniques 9-11
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
alt_disk_mksysb options
The alt_disk_mksysb command has the following options:
-m device
-d target-disks
-B : do not change the bootlist
-i image.data
-s script
-R resolve.conf
-p platform
-L mksysb_level
-n : remain a NIM client
-P phase
-c console
-r reboot after install
-k keep mksysb device customization
-y : import non-rootvg volume groups
© Copyright IBM Corp. 2009, 2012 Unit 9. Install and cloning techniques 9-13
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
# smit alt_mksysb
Install mksysb on an Alternate Disk
[Entry Fields]
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 9. Install and cloning techniques 9-15
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
hdisk0
rootvg (AIX 7.1 TL01)
Clone
hdisk1
AIX AIX 7.1 TL02 rootvg (AIX 7.1 TL02)
Notes:
Example
In the example, alt_disk_copy -b update_all -l /dev/cd0 -d hdisk1, rootvg
which resides on hdisk0, is cloned to the alternate disk hdisk1. Additionally, a new
maintenance level will be applied to the cloned version of AIX.
© Copyright IBM Corp. 2009, 2012 Unit 9. Install and cloning techniques 9-17
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
# smit alt_clone
Clone the rootvg to an Alternate Disk
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 9. Install and cloning techniques 9-19
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Original hdisk0
rootvg (AIX 7.1 TL01)
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 9. Install and cloning techniques 9-21
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Describe how to remove an alternate disk installation.
Details —
Additional information —
Transition statement — You may have noted that, up to this point, we only talked about
applying maintenance to an existing version and release of AIX, but not about migrating to
a new version and release. To use the alternate disk capabilities with a migration install,
you need to use NIM. Let’s look at this briefly.
Uempty
Clone
NIM server NIM client: hdisk1
lpar1
rootvg
AIX AIX 7.1
(AIX 7.1)
## nimadm
nimadm -c
-c lpar1
lpar1 -s
-s spot1
spot1 -l
-l lpp1
lpp1 -d
-d "hdisk1"
"hdisk1" -Y
-Y
Notes:
What is nimadm?
The nimadm command (Network Install Manager Alternate Disk Migration) is a utility that
allows the system administrator to create a copy of rootvg to a free disk (or disks) and
simultaneously migrate it to a new version or release level of AIX. The nimadm
command uses NIM resources to perform this function.
Advantages of nimadm
There are several advantages to using the nimadm command over a conventional
migration:
- Reduced downtime. The migration is performed while the system is up and
functioning normally. There is no requirement to boot from installation media, and
the majority of processing occurs on the NIM master.
© Copyright IBM Corp. 2009, 2012 Unit 9. Install and cloning techniques 9-23
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
- The nimadm command facilitates quick recovery in the event of migration failure.
Since the nimadm command uses alt_disk_install to create a copy of rootvg, all
changes are performed to the copy (altinst_rootvg). In the event of serious
migration installation failure, the failed migration is cleaned up and there is no need
for the administrator to take further action. In the event of a problem with the new
(migrated) level of AIX, the system can be quickly returned to the pre-migration
operating system by booting from the original disk.
- The nimadm command allows a high degree of flexibility and customization in the
migration process. This is done with the use of optional NIM customization
resources: image_data, bosinst_data, exclude_files, pre-migration script,
installp_bundle, and post-migration script.
Details of using NIM to perform an alternate disk migration are not covered in this
course.
© Copyright IBM Corp. 2009, 2012 Unit 9. Install and cloning techniques 9-25
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 9. Install and cloning techniques 9-27
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
© Copyright IBM Corp. 2009, 2012 Unit 9. Install and cloning techniques 9-29
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Topic 2 objectives
IBM Power Systems
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 9. Install and cloning techniques 9-31
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
multibos overview
IBM Power Systems
Notes:
Overview
The main purpose of using multibos is to have the type of alternate BOS (base
operating system) capabilities that are available with the alternate disk technology,
without having to use another disk. The operating system filesets do not occupy enough
space to justify allocating another entire disk for that purpose. With multibos, you can
have the two BOS versions on the same disk.
This is accomplished by creating copies of the effected (by an OS update) base
operating system logical volumes (active BOS) with a different file name path. Note that
these copies are in the one and only rootvg.
Another advantage to multibos is that there is lower overhead to the cloning operation,
since it does not need to clone all the logical volumes in the rootvg.
Once you have created the alternate BOS, changes, such as applying maintenance,
can be made to these copies, without changing the level of code being used in the
Uempty active BOS. In addition to applying maintenance, you can access and make
configuration changes to the standby BOS through two techniques: mounting the
standby BOS and starting an interactive shell (chroot) for the standby BOS.
When you would like to test the standby BOS, you simply reboot using the standby copy
of the boot logical volume (BLV). If there is a problem with the changes that were made,
configure the bootlist to use the original BLV and a reboot will return you to the original
version of the BOS.
© Copyright IBM Corp. 2009, 2012 Unit 9. Install and cloning techniques 9-33
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Provide an overview of multibos function and purpose.
Details —
Additional information —
Transition statement — Let’s first look at the file system structure of the alternate BOS,
when created.
Uempty
Active BOS
/
BLV jfslog (hd4)
(hd5) (hd8)
Standby BOS
home opt usr var tmp bos_inst (if mounted)
(hd1) (hd10opt) (hd2) (hd9var) (hd3) (bos_hd4)
BLV jfslog
(bos_hd5) (bos_hd8)
© Copyright IBM Corporation 2012
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 9. Install and cloning techniques 9-35
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Explain the structure of the standby BOS.
Details —
Additional information —
Transition statement — Next, we will look at how we actually create a standby BOS using
the multibos command.
Uempty
• multibos –s –X
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 9. Install and cloning techniques 9-37
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
image.data customization
If you want to change any characteristics of the cloned rootvg logical volumes or file
systems, you can create a copy of the image.data file, edit the copy, and then specify
that the multibos command should use your edited copy (by using the -i flag).
For example, if you wanted the cloned logical volumes to be placed on a disk that was
added to the rootvg, then you would first run the mkszfile command to obtain a
current capture of the characteristics, copy the created /image.data file to a different
name, and edit it to specify that the cloned logical volumes should be placed on the
additional disk. Then, you need to point to that new file by running the command:
# multibos -i <image.data copy> -Xs
© Copyright IBM Corp. 2009, 2012 Unit 9. Install and cloning techniques 9-39
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 9. Install and cloning techniques 9-41
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Uempty 3) If you specify a fix list, with the -f flag, the fix list is installed using the
instfix utility. The fix list syntax should follow instfix conventions. If you
specify the -p preview flag, then instfix will perform a preview operation.
4) If you specify the update_all function, with the -a flag, it is performed using
the install_all_updates utility. If you specify the -p preview flag, then
install_all_updates performs a preview operation. Note: It is possible to
perform one, two, or all three of the installation options during a single
customization operation.
5) The standby boot image is created and written to the standby BLV using the
AIX bosboot command. You can block this step with the -N flag. You should
only use the -N flag if you are an experienced administrator and have a good
understanding of the AIX boot process.
6) Upon exit, if standby BOS file systems were mounted in step 1, they are
unmounted.
© Copyright IBM Corp. 2009, 2012 Unit 9. Install and cloning techniques 9-43
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Explain the various standby BOS operations.
Details — Provide a brief description of what each of these options provide and why they
might want to do them. Do not spend too much time here; they will experience these
first-hand in the lab exercises.
Additional information —
Transition statement — Let’s continue with using the standby shell, booting to a particular
boot logical volume, and finally removing a standby BOS.
Uempty
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 9. Install and cloning techniques 9-45
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Alternate boot
The bootlist command supports multiple BLVs. As an example, to boot from disk
hdisk0 and BLV bos_hd5, you would enter the following:
# bootlist –m normal hdisk0 blv=bos_hd5
After the system is rebooted from the standby BOS, the standby BOS logical volumes
are mounted over the usual BOS mount points, such as /, /usr, /var, and so on. The set
of BOS objects, such as the BLV, logical volumes, file systems, and so on that are
currently booted are considered the active BOS, regardless of logical volume names.
The previously active BOS becomes the standby BOS in the existing boot environment.
Some facilities have been blocked from alternating the BLV. When they tried to set the
bootlist to the standby BLV, they would receive the following error:
0514-226 bootlist: Invalid attribute value for blv
This is an indication that either the BLV is corrupted or the ODM entry for it is corrupted.
A suggested solution is to rebuild the standby BLV. This requires a special bosboot flag:
# bosboot -sd /dev/ipldevice -M standby -l bos_hd5
© Copyright IBM Corp. 2009, 2012 Unit 9. Install and cloning techniques 9-47
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Checkpoint (1 of 2)
IBM Power Systems
Notes:
Checkpoint solutions (1 of 2)
IBM Power Systems
3. Why should you not use exportvg with an alternate disk volume
group?
The answer is this will remove rootvg related entries from
/etc/filesystems.
Additional information —
Transition statement —
© Copyright IBM Corp. 2009, 2012 Unit 9. Install and cloning techniques 9-49
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Checkpoint (2 of 2)
IBM Power Systems
Notes:
Checkpoint solutions (2 of 2)
IBM Power Systems
Additional information —
Transition statement — Let’s do a lab exercise using the multibos facility.
© Copyright IBM Corp. 2009, 2012 Unit 9. Install and cloning techniques 9-51
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Exercise: multibos
IBM Power Systems
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 9. Install and cloning techniques 9-53
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Unit summary
IBM Power Systems
Notes:
Discussion:
Alternate disk installation techniques are available:
- Installing a mksysb onto an alternate disk
- Cloning the current rootvg onto an alternate disk
Alternate BOS can be created and maintenance applied
© Copyright IBM Corp. 2009, 2012 Unit 9. Install and cloning techniques 9-55
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Estimated time
00:20 Topic 1
00:20 Topic 2
00:15 Topic 3
Reference
Online AIX Version 7.1 Command Reference volumes 1-6
Online AIX Version 7.1 Operating system and device
management
Online AIX Version 7.1 Installation and migration
Note: References listed as “online” above are available through the
IBM Systems Information Center at the following address:
http://publib.boulder.ibm.com/eserver
© Copyright IBM Corp. 2009, 2012 Unit 10. Advanced backup techniques 10-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Uempty
Unit objectives
IBM Power Systems
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 10. Advanced backup techniques 10-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — List this unit’s objectives.
Details —
Additional information —
Transition statement — Let’s start with a discussion of data consistency issues.
Uempty
Transaction X0, Y0
Write X1 X1, Y0
backup
X1, Y0
Write Y1 X1, Y1
Notes:
Backing up data while a file system is active can lead to data consistency problems.
The backup utility is sequentially copying files while applications may still be updating
those contents. For a collection of related updates, the backup utility may copy one
piece of data the data after the update, but copy the other related data before it is
updated. The result can be a backup with two pieces of data which are not consistent
with one another.
Some applications, especially database engines, record the progress of related updates
in a transaction log. During the application recovery process, that log will identify
transactions where not all related updates were confirmed. The recovery process will
then back out that transaction, backing out any updates that were recorded during the
previous backup.
If an application does not have this type of recovery logic, then use of the inconsistent
backup can result in serious problems. In that situation, we need to have a way to
ensure that the backup has consistency.
© Copyright IBM Corp. 2009, 2012 Unit 10. Advanced backup techniques 10-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Explain the issue of consistency in data backups.
Details —
Additional information —
Transition statement — Let’s look at some solutions for ensuring consistent data
backups.
Uempty
Notes:
Traditionally, the best practice is to stop the application and unmount the file system,
followed by executing a backup by inode. This ensures that there are no updates
occurring during the backup and that all file system’s data has been flushed to disk. If a
backup takes a long time, having the application down for that long may be
unacceptable.
Some applications can be quiesced. In this state, either new transactions are not
accepted or they are only processed in user space without writing the updates to the file
system. Either way, the backup of the mounted file system may proceed without any file
system activity from the quiesced application. Again, if the backup takes a long time,
being quiesced for that long may still be unacceptable.
The solution is to use the quiesced state to quickly capture the state of the file system.
The captured state would not be affected by on-going updates to the actual file system.
A method for capturing the file system state may only run for a few seconds. Such a
short time for being in a quiesced state is often acceptable.
© Copyright IBM Corp. 2009, 2012 Unit 10. Advanced backup techniques 10-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Explain traditional solutions to providing consistent backups.
Details — Sometimes the quiesced state is referred to as the hot-backup mode, online
backup mode, or suspend mode of the application. The students should be encouraged to
work with the trained administrator of the application to obtain the proper state for online
backup.
Additional information —
Transition statement — We need to examine various ways to capture a point in time state
of a file system. The first methods we will look at are methods which require us to mirror the
file systems using LVM mirroring.
© Copyright IBM Corp. 2009, 2012 Unit 10. Advanced backup techniques 10-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Topic 1 objectives
IBM Power Systems
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 10. Advanced backup techniques 10-11
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
jfslog
# lsvg -l newvg
newvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
loglv00 jfslog 1 3 3 open/syncd N/A
lv03 jfs 1 3 3 open/syncd /fs1
Notes:
Requirements
By splitting a mirror, you can perform a backup of the mirror that is not changing while
the other mirrors remain online.
To do this, it is best to have three copies of your data. You will need to stop one of the
copies but the other two will continue to provide redundancy for the online portion of the
logical volume.
You are also required to mirror the journal log for the file system.
The output from lsvg -l indicates that the logical volume and the log are both mirrored.
# lsvg -l newvg
newvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
loglv00 jfslog 1 3 3 open/syncd N/A
lv03 jfs 1 3 3 open/syncd /fs1
© Copyright IBM Corp. 2009, 2012 Unit 10. Advanced backup techniques 10-13
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
/backup
File system
/fs1
jfslog
Notes:
Uempty Example
# lsvg -l newvg
newvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT
POINT
loglv00 jfslog 1 3 3 open/syncd N/A
lv03 jfs 1 3 3 open/stale /fs1
lv03copy00 jfs 0 0 0 open/syncd /backup
The /fs1 file system still contains three physical partitions, but the mirror is now stale.
The stale copy is now accessible by the newly created read-only file system /backup.
That file system resides on a newly created logical volume, lv03copy00. This logical
volume is not synchronized and is considered stale. Also, it does not indicate any
logical partitions (since the logical partitions really belong to lv03).
You can look at the content and interact with the /backup file system just like any other
read-only file system.
© Copyright IBM Corp. 2009, 2012 Unit 10. Advanced backup techniques 10-15
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — To show how to split the mirror.
Details — It is important to emphasis that this split mirror technique can only be used with
JFS file systems and can not be used with JFS2 file systems.
Emphasize that we still need to stop file system updates during the period when the
splitcopy operation is occurring. Otherwise, the splitcopy could have inconsistent data.
But, the splitcopy happens so quickly that the application only needs to be quiesced for a
very short period of time.
The chfs command is used to split the mirror. It will create a new file system that will
contain the contents of the snapshot. You can view or back up the content of the file
system.
Additional information — This unit assumes that the application uses file systems to hold
the data. While it is not very common, some applications may use raw logical volumes, in
which case the file system level command is not used. Instead there is an LVM command,
chlvcopy, that can be used to create a split mirror copy.
Transition statement — When you have completed these tasks and are ready to
reintegrate the mirror, you just need to use the rmfs command. Let’s see how that works.
Uempty
Copy 1 Copy 2
Copy 3
syncvg
jfslog
# unmount /backup
# rmfs /backup
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 10. Advanced backup techniques 10-17
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — To explain how to reintegrate the stale copy back into the active logical
volume.
Details —
Additional information —
Transition statement — The second method, based on LVM mirroring, is based on the
creation of a snapshot volume group. Let’s look at how to create a snapshot volume group.
Uempty
Notes:
How it works
Snapshot support for a mirrored volume group is provided to split a mirrored copy of a
fully mirrored volume group into a snapshot volume group.
It is best practice to ensure that there are no stale copies in the original volume group.
The splitvg command will reject a situation where the only remaining non-stale copy is
in disk to be split unless you use the force (-f) option.
When the volume group is split, the original volume group will stop using the disks that
are now part of the snapshot volume group.
The splitvg command uses the recreatevg command to implement the split. This is a
very different technique from the JFS split mirror. It creates a new volume group with
new file system and logical volume names.
© Copyright IBM Corp. 2009, 2012 Unit 10. Advanced backup techniques 10-19
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — To describe the volume group snapshot support.
Details — Discuss the student notes.
Since the splitvg is not based on JFS split mirror, it does not have the same file system
type restriction. It can be used with either JFS or JFS2 file systems.
Also mention some or all of the following restrictions:
• The only allowable chvg options on the snapshot volume group are -a, -R, -S and -u
• The only allowable chvg options on the original volume group are -a, -R, -S, -u and -h
• Partition allocation changes will not be allowed on the snapshot volume group
• A volume group cannot be split if a disk is already missing
• A volume group cannot be split if the last non-stale partition would be on the snapshot
volume group
Additional information —
Transition statement — Once split, the two volume groups are still related to one another.
Let us look at the implications this has for later resynchronization of the copies.
Uempty
Notes:
Both volume groups will keep track of changes in physical partitions within the volume
group so that when the snapshot volume group is rejoined with the original volume
group, the synchronization only needs to occur on the subset of physical partitions
which were touched during the split period. This is much faster and has less
performance impact than resynchronizing all physical partitions, as is needed with the
JFS split copy function.
Physical partition changes in both volume groups are tracked. Writes to a physical
partition in the original volume group causes a corresponding physical partition in the
snapshot volume group to be marked stale. Writes to a physical partition in the
snapshot volume group causes that physical partition to be marked stale.
To rejoin the volume groups, use the joinvg command. The stale physical partitions are
included in the original mirroring and the stale copies are automatically resynchronized.
The user will see the same data in the rejoined volume group as was in the original
volume group before the rejoin. In other words, the third copy will show the data
changes that occurred in the original volume group during the period it was split off.
© Copyright IBM Corp. 2009, 2012 Unit 10. Advanced backup techniques 10-21
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Continue the discussion on how snapshot volume groups work.
Details —
Additional information —
Transition statement — Now, let’s take a look at the commands.
Uempty
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 10. Advanced backup techniques 10-23
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — To describe the volume group snapshot command and options.
Details — Discuss the command syntax.
Specify the -f flag to force the join when disks in the snapshot volume group are not
active. The mirror copy on the inactive disks will be removed from the original volume
group.
Additional information —
Transition statement — Let’s look at an example of working with a snapshot volume
group.
Uempty
Notes:
The splitvg creates a point in time separate snapshot volume group. The splitvg
command will fail if any of the disks to be split are not active within the original volume
group.
This volume group can be used to perform backup or other operations. In the example,
one of the renamed file systems is backed up by inode (unmounted). You could also
mount the file system and backup by name instead.
Later, the joinvg command is used to rejoin the snapshot volume to the original volume
group.
In the event of a system crash or loss of quorum while running this command, the
joinvg command must be run to rejoin the disks back to the original volume group.
You must have root authority to run these commands.
© Copyright IBM Corp. 2009, 2012 Unit 10. Advanced backup techniques 10-25
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Provide an example of using a split volume group for executing a backup.
Details — You may wish to review with the students the strengths and weakness of the
LVM mirror based techniques. One of the big weaknesses is the need to use double or
triple the amount of storage that would be used without mirroring. Ask if any of the students
has actually used these in their shop (be careful - they may confuse the next topic with
what we have just covered).
This may be a good time to do Part 1 (snapshot volume groups) of the Lab Exercises for
this unit, instead of doing all of the exercises at the end of this unit. While JFS split mirror
was covered in this topic, Part 2 (JFS split mirror) of the exercise is an optional part.
Additional information —
Transition statement — Let’s next look at a backup technique which does not depend on
using LVM mirroring.
© Copyright IBM Corp. 2009, 2012 Unit 10. Advanced backup techniques 10-27
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Topic 2 objectives
IBM Power Systems
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 10. Advanced backup techniques 10-29
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
JFS2 snapshot (1 of 2)
IBM Power Systems
Notes:
JFS2 snapshot
A point-in-time image for a JFS2 file system is called a snapshot. The file system which
is the source of this point-in-time image is referred to as the snapped file system or
snappedFS.
The snapshot view of the data remains static and retains the same security permissions
that the original snappedFS had when the snapshot was made. Also, a JFS2 snapshot
can be created without unmounting the file system, or quiescing the file system (though
it may be advisable for some application to briefly quiesce during the snapshot). A
snapshot can be used to access files or directories as they existed when the snapshot
was taken.
The snapshot can then be used to create a backup of the file system at the given point
in time that the snapshot was taken. The snapshot also provides the capability to
access files or directories as they were at the time of the snapshot.
© Copyright IBM Corp. 2009, 2012 Unit 10. Advanced backup techniques 10-31
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
JFS2 snapshot (2 of 2)
IBM Power Systems
When a write or delete occurs in the snappedFS, the affected blocks are
copied into existing snapshots
© Copyright IBM Corporation 2012
Notes:
Uempty logical volume from the file system. The external snapshot can be mounted separately
from the file system at its own unique mount point. A given file system can only use
either internal or external snapshots; it cannot mix the different types.
© Copyright IBM Corp. 2009, 2012 Unit 10. Advanced backup techniques 10-33
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Continue basic discussion of a JSF2 snapshot.
Details —
Additional information — A JFS2 snapshot is a file system that maps its contents to the
contents of the source snappedfs. If the snappedfs is not modified, the snapshot does not
store any of the files in its own physical partition allocations, and has content which is
identical to the snappedfs.
If the snappedfs is modified, the original value of the affected blocks are saved in the
allocated storage of the snapshot file system. When the snapshot is modified, it either
retrieves the data from the snappedfs (if the data has not been modified) or it retrieves the
data from its own disk storage (if the snappedfs data was changed).
So, the snapshot always gives us the state of the data at the time the snapshot was
created, but only uses enough storage to hold the data that has been changed in the
snappedfs. When allocating space for a snapshot logical volume, we can typically allocate
as little as 2-6% of the size of the snappedfs (depending on the volatility of the snappedfs).
Note that when compared to using split mirror copies, the JFS2 snapshot has very little
overhead. We do not have to create a total copy of the existing data when creating the
snapshot (as we do in creating mirror copies) and instead of doing a resync of the data
before the next backup (as we need to do with the spit mirror when rejoining), we simply
eliminate the snapshot and create a new one when needed for the next backup.
Transition statement — Let’s take a closer looks at the mechanism behind a JFS2
snapshot.
For an ILO (Instructor Lead On-line) class: You should play file AN152U10F15 in the
multimedia library of Elluminate in place of the next two visuals. You can then continue your
lecture normally to reinforce the topics if desired.
- a.To access the multimedia library click on the CD button along the toolbar in
Elluminate.
- b.Once the multimedia library window is open, select AN152U10F15 and click on
play.
- c.Ask the students to indicate via green checkmark when they have finished the file.
For a FTF (Face to Face) class: You should play file AN152U10F15 on your instructor PC
in place of the next two visuals.
Note that you can also use this activity as a review for the information covered in the
next two visuals as well.
Uempty
snappedFS
inode1 inode2
snapshot
inode1 inode2
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 10. Advanced backup techniques 10-35
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Explain how a snapshot accesses the snappedFS data blocks.
Details —
Additional information —
Transition statement — Let’s look at what happens as data blocks in the snappedFS are
modified.
Uempty
snappedFS
inode1 inode2
snapshot
inode1 inode2
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 10. Advanced backup techniques 10-37
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Explain how the original version of the data is copied to the snapshot when
modified.
Details —
Additional information —
Transition statement — Let’s look at how we can implement the JFS2 snapshot, starting
with the SMIT facility.
Uempty
# smit jfs2
. . .
List Snapshots for an Enhanced Journaled File System
Create Snapshot for an Enhanced Journaled File System
Mount Snapshot for an Enhanced Journaled File System
Remove Snapshot for an Enhanced Journaled File System
Unmount Snapshot for an Enhanced Journaled File System
Change Snapshot for an Enhanced Journaled File System
Rollback an Enhanced Journaled File System to a Snapshot
Notes:
The various JFS2 snapshot operations can be executed from SMIT dialog panels. The
SMIT JFS2 menu includes many items which are JFS2 snapshot related.
An example with only the menu items for snapshot is shown in the visual.
© Copyright IBM Corp. 2009, 2012 Unit 10. Advanced backup techniques 10-39
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Show how all the JFS2 snapshot functions can be accessed from the SMIT
JFS2 menu.
Details —
Additional information —
Transition statement — Let’s first look at how we create an external snapshot.
For an ILO (Instructor Lead On-line) class: You should play file AN152U10F18 in the
multimedia library of Elluminate in place of the next visual. You can then continue your
lecture normally to reinforce the topics if desired.
- a.To access the multimedia library click on the CD button along the toolbar in
Elluminate.
- b.Once the multimedia library window is open, select AN152U10F18 and click on
play.
- c.Ask the students to indicate via green checkmark when they have finished the file.
For a FTF (Face to Face) class: You should play file AN152U10F18 on your instructor PC
in place of the next visual.
Note that you can also use this activity as a review for the information covered in the
next visual as well.
Uempty
[Entry
[Entry Fields]
Fields]
File
File System
System Name
Name /home/myfs
/home/myfs
SIZE
SIZE of
of snapshot
snapshot
Unit
Unit Size
Size Megabytes
Megabytes ++
** Number
Number of
of units
units [500]
[500] ##
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 10. Advanced backup techniques 10-41
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Creating an internal snapshot for a JFS2 file system that is not mounted
First, it is important to know that the you cannot use internal snapshots unless the file
system was enabled to support them at file system creation.
• To enable the file system to support internal snapshots (at creation time only):
# crfs –a isnapshot=yes ....
The mount option, -o snapto=snapshotlv, can be used to create a snapshot for a
JFS2 file system that is not currently mounted:
# mount -o snapto=snapshotLV snappedFS MountPoint
or
# mount -o snapto=snapshotname snappedFS MountPoint
If the snapto value starts with a slash, then it is assumed to be a special device file for
an existing logical volume where the snapshot should be created. If the snapto value
does not start with a slash, then it is assumed to be the name of an internal snapshot to
be created.
For example:
# mount -o snapto=/dev/mysnaplv /dev/fslv00 /home/myfs
This will mount the file system contained on the /dev/fslv00 to the mount point of
/home/myfs and then proceeds to create a snapshot for the /home/myfs file system in
the logical volume /dev/mysnaplv.
© Copyright IBM Corp. 2009, 2012 Unit 10. Advanced backup techniques 10-43
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Describe how to create a JFS2 snapshot
Details —
Additional information —
Transition statement — Let’s take a look at how you create a JFS2 internal snapshot.
Uempty
Create
Create Snapshot
Snapshot for
for an
an Enhanced
Enhanced Journaled
Journaled File
File System
System in
in File
File System
System
[Entry
[Entry Fields]
Fields]
File
File System
System Name
Name /home/myfs
/home/myfs
** Snapshot Name
Snapshot Name [mysnap]
[mysnap]
Internal snapshot attribute must be set to yes on creation of the file system:
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 10. Advanced backup techniques 10-45
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Explain how to create a JFS2 internal snapshot.
Details —
Additional information —
Transition statement — Later, we will want to identify if a file system has a snapshot and
obtain information about those snapshots.
Uempty
Listing snapshots
IBM Power Systems
Snapshots
Snapshots for
for /home/myfs2
/home/myfs2
Current
Current Name
Name Time
Time
mysnap
mysnap Wed
Wed 19
19 Nov
Nov 08:44:33
08:44:33 2008
2008
mysnap2
mysnap2 Fri 21 Nov 09:33:33 2008
Fri 21 Nov 09:33:33 2008
** mysnap3
mysnap3 Mon
Mon 24
24 Nov
Nov 14:03:18
14:03:18 2008
2008
## snapshot
snapshot -q
-q /home/myfs
/home/myfs
Snapshots
Snapshots for
for /home/myfs
/home/myfs
Current
Current Location
Location 512-blocks
512-blocks Free
Free Time
Time
** /dev/fslv06
/dev/fslv06 262144
262144 261376
261376 Wed
Wed May
May 66 18:15:11
18:15:11 2009
2009
Notes:
The snapshot –q option can be used display the snapshots related to the specified file
system.
If the file system uses internal snapshots, then the report provides the snapshot names
and creation times. The * indicates the current snapshot.
# snapshot -q /home/myfs2
© Copyright IBM Corp. 2009, 2012 Unit 10. Advanced backup techniques 10-47
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
# snapshot -q /home/myfs
© Copyright IBM Corp. 2009, 2012 Unit 10. Advanced backup techniques 10-49
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Rollback
The rollback command is an interface to revert a JFS2 file system to a point-in-time
snapshot. The snappedFS parameter must be unmounted before the rollback
command is run and remains inaccessible for the duration of the command. Any
snapshots that are taken after the specified snapshot (snapshotObject for external or
snapshotName for internal) are removed. The associated logical volumes are also
removed for external snapshots.
Uempty As with any file copying, be careful about changing the nature of the file (ownership,
permission, sparseness, and so on). Using the backup and restore utilities to
implement a copy of files is often a safer technique.
© Copyright IBM Corp. 2009, 2012 Unit 10. Advanced backup techniques 10-51
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Explain how to use a JFS2 snapshot to recover data.
Details —
Additional information —
Transition statement — While using a snapshot directly to recover data is useful, it does
not address a situation in which the disk holding the snappedFS is lost, much less a site
disaster recovery situation. Let’s look at how we can use a snapshot as a stable source for
a backup to media or to a network server.
Uempty
For example:
# backsnap -m /mntsnapshot -s size=16M –I –f /dev/rmt0
/home/myfs
© Copyright IBM Corporation 2012
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 10. Advanced backup techniques 10-53
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
This will create a 16 MB logical volume and create a snapshot for the /home/myfs file
system on the newly created logical volume. It then mounts the snapshot logical volume
on /mntsnapshot. The remaining arguments are passed to the backup command. In
this case, the files and directories in the snapshot will be backed up by name (-i) to
/dev/rmt0.
© Copyright IBM Corp. 2009, 2012 Unit 10. Advanced backup techniques 10-55
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
For example:
# backsnap –n mysnap -s size=16M -i -f/dev/rmt0
/home/myfs
Notes:
Uempty This will create a 16 MB logical volume and create a snapshot for the /home/myfs file
system on the newly created logical volume. It then mounts the snapshot logical volume
on /mntsnapshot. The remaining arguments are passed to the backup command. In
this case, the files and directories in the snapshot will be backed up by name (-i) to
/dev/rmt0.
© Copyright IBM Corp. 2009, 2012 Unit 10. Advanced backup techniques 10-57
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Explain how to use an internal snapshot with a backup utility.
Details —
Additional information —
Transition statement — If you make a mistake and underestimate how quickly data is
modified or deleted, then you can have space allocation problems related to the JFS2
snapshot allocation. Let’s look at how to monitor and manage that situation.
Uempty
External snapshot:
The snapshot report identifies the size and amount of free space
If the snapshot needs more space:
# snapshot –o size=+1 snapshotLV
Internal snapshot:
Shares logical volume with the snappedFS
# df –m snappedFS
If snappedFS is out of space, try to free up space possibly delete old
snapshots
# snapshot –d –n snapshot_name snappedFS
Notes:
It is useful to be able to identify situation where a snapshot is growing large. If a
snapshot runs out of space then all snapshots are invalidated and become unusable. If
dealing with an internal snapshot, the snapshots can contribute to the entire file system
running out of space.
To monitor an external snapshot, use the query option of the snapshot command. An
alternative would be to mount the snapshot and use the df command, but that is more
complicated.
If an external snapshot needs more room, you can dynamically increase the size of the
snapshot logical volume by using the size option of the snapshot command.
For an internal snapshot, there is no mechanism for identifying the space usage of the
snapshots. Instead, you monitor the size of the snappedFS.
When a file system is running out of space, one way to free space is to delete old
snapshots. Keeping many generations of snapshots can be useful, but it can also be
expensive in terms of space usage.
© Copyright IBM Corp. 2009, 2012 Unit 10. Advanced backup techniques 10-59
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Explain how to manage snapshot space allocation issues.
Details — You may choose to introduce Part 3 (JFS2 snapshots) of the matching exercise
at this point, rather than waiting until the end of the unit.
Additional information —
Transition statement — Let’s review what we have covered.
© Copyright IBM Corp. 2009, 2012 Unit 10. Advanced backup techniques 10-61
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Topic 3 objectives
IBM Power Systems
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 10. Advanced backup techniques 10-63
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Use of copy services provided by SAN attached storage subsystems is fairly common.
In this unit, we will call these services: SAN Copy. These copy services make a
point-in-time exact copy of the contents of a LUN as seen by the storage subsystem
controller. Not only can they provide a point in time copy of a LUN, but this activity does
not depend on any host system resources. On the other hand, there are potential
problems that result from only seeing the data as it resides in the storage subsystem.
Normally, when an application writes data, it receives confirmation of the write when
AIX has cached that data in memory. Later, various AIX mechanisms will flush that data
to disk storage. At the point of time that a SAN Copy is initiated, the transaction related
updates may either be in AIX kernel memory or in the storage subsystem. This provides
the possibility that the SAN Copy may have inconsistent data, even if the application
was quiesced prior to taking the snapshot.
To avoid this problem, you need to ensure that none of the related data updates are
cached in AIX memory at the time of the SAN Copy. Once again, unmounting the file
system is generally not an acceptable solution given the disruption to the application.
© Copyright IBM Corp. 2009, 2012 Unit 10. Advanced backup techniques 10-65
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
AIX provides a JFS2 file system freeze capability. It stops processing new file system
I/O requests and then flushes out all memory cached file system data to the physical
volume.
Once the application is quiesced and the file system frozen, use of a SAN Copy will
capture consistent data.
After the SAN Copy completes, we can then thaw the file system and resume
application processing.
This is only needed when the application allows AIX to cache writes and to decide when
to flush the cached data. There are two situations where the freeze mode is not needed.
- The application processes the file using Direct I/O (DIO). With DIO, writes are
synchronous and go directly to storage without any caching in kernel memory.
Concurrent I/O always uses DIO.
Uempty - The application issues the synchronous fsync() system call for its output files,
forcing AIX to flush all cached data for that file and returning to the application when
that is completed.
The chfs freeze attribute requires a value which specifies a timeout period. If the file
system is not explicitly thawed (again using the chfs command) within that timeout
period, the file system will be automatically thawed. This is intended to avoid permanent
file system freezes and the timeout should be set a time period which is much longer
than you would imagine being required to process your SAN Copy.
The reason for the sync command being issued immediately prior to the freeze
request, is that for large amounts of cached data, the sync command is much more
efficient in finding and flushing that data than the freeze function. Then the freeze
function only needs to handle data that was cached immediately after that flush; should
be a small amount of data.
© Copyright IBM Corp. 2009, 2012 Unit 10. Advanced backup techniques 10-67
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Explain the use of the JFS2 freeze and thaw capability.
Details — Emphasize that this is only for JFS2 file systems.
Additional information —
Transition statement — While really a storage subsystem function rather than an AIX
function, consistency groups have important implications for AIX file system integrity when
later recovering using SAN Copy data. Let’s look at this.
Uempty
Consistency groups
IBM Power Systems
Notes:
While the previously discussed techniques can ensure the consistency of a
point-in-time copy of a single LUN, when multiple LUNs are interrelated we are faced
with new issues. Normally, each LUN would be SAN Copied separately and each would
be at a different point-in-time. But since they are at different points-in-time, between
them, they can have inconsistency of related data.
When the storage subsystem defines LUNs as belonging to a common consistency
group, the entire consistency group is copied at the same point-in-time. This ensures
data consistency.
Of special concern is the relationship between a file system and its journal log. If these
are on different LUNS and we do not ensure consistency, then we essentially have
metadata corruption which can make that file system and log combination unusable.
If multiple file systems share the same log and some of the those file systems are not
included in the consistency group, we again will have a situation where later access of
the log will be incompatible with the state of those other file systems. Thus, it is
© Copyright IBM Corp. 2009, 2012 Unit 10. Advanced backup techniques 10-69
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
recommended that for file systems which are using SAN Copy, either each file system
has its own external journal log or that they use JFS2 in-line journal logs.
If the LUN is one of many physical volumes in an entire volume group which is being
backed up, it is recommended that all of the LUNs in the volume group be included in
the same consistency group.
© Copyright IBM Corp. 2009, 2012 Unit 10. Advanced backup techniques 10-71
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
SAN Copy creates exact duplicates of the physical volumes, rather than a backup
image to be restored. For an AIX system to access the disk, it needs to be discovered
(zoned to that host and detected, by way of cfgmgr, by that host) and then imported into
the ODM.
If it is to act as the rootvg of that system, it must be designated as the boot device
before booting that host.
User volume groups may be accessed to either directly recover contents from the copy,
or to enable a backup utility to create a backup of the copied volume group. In either
case, the PVID on the disk (or disks) should be changed to avoid issues of duplicate
PVIDs.
If accessing the entire volume group from a system which is different from the original
system, use the importvg command on any disk in the consistency group for the
volume group, vary online, run a file system check, and mount the file systems of
interest. To avoid possible future PVID conflicts you should consider changing the PVID
Uempty on the disks after importvg is completed. This can be accomplished using the chdev
command as follows:
# chdev -l hdisk# -a pv=clear
# chdev -l hdisk# -a pv=yes
When accessing from the same system (it is assumed that the original volume group
still exists) or accessing a subset of the physical volumes in the volume group, use the
recreatevg command, followed by a file system check and a mount of the file system
which are of interest. The recreatevg command has special abilities to selective
restore only the logical volumes that reside on the specified disks. The recreatevg
command now has the ability to automatically change the PVIDs. If it did not do that,
your would need to first change the PVIDs (using chdev) prior to running recreatevg.
© Copyright IBM Corp. 2009, 2012 Unit 10. Advanced backup techniques 10-73
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Discuss how to access a SAN Copy.
Details —
Additional information —
Transition statement — Let’s look at this recreatevg command in more detail.
Uempty
Notes:
The recreatevg command is specially designed to handle the import of volume group
copies to the same system from which they were copied.
One way in which it differs from just using importvg, is the creation of a new VGID and
new PVIDs. Another major difference is that it allows you to specify prefixes to be used
when creating new file system names and logical volume names, which avoid conflicts
with the original names.
As seen in the visual, the -L option is used to create a prefix to the file system name,
which becomes a common parent directory to all of the file system mount points. The -Y
option is used to create a prefix for the logical volume names.
It is very important that you specify all disks that belong to the volume group, as
arguments to the command, when trying to access the entire volume group.
If you are trying to access a subset of physical volumes in the volume group, you may
force it to create a new volume group that only contains the specified disks and the
those logical volumes which are totally contained in those disks.
© Copyright IBM Corp. 2009, 2012 Unit 10. Advanced backup techniques 10-75
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Explain detail in the use of the recreatevg command.
Details — There is no exercise for this topic.
Additional information —
Transition statement — Let us review some of what we have covered with some
checkpoint questions.
Uempty
Checkpoint
IBM Power Systems
2. True or false: The creation of a JFS split copy marks all of the
split mirror copies as stale.
3. True or false: After the creation of a JFS split mirror copy, the
administrator needs to mount the new file system in order to
access the split copy.
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 10. Advanced backup techniques 10-77
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Review and test the students, understanding of this section.
Details — A suggested approach is to give the students about five minutes to answer the
questions on this page. Then, go over the questions and answers with the class.
Checkpoint solutions
IBM Power Systems
1. True or false: The creation of a snapshot volume group marks all copies in the
snapshot as stale.
The answer is false.
2. True or false: The creation of a JFS split copy marks all of the split mirror copies
as stale.
The answer is true.
3. True or false: After the creation of a JFS split mirror copy, the administrator needs
to mount the new file system in order to access the split copy.
The answer is false.
4. To access a SAN Copy of an active volume group on the source system, use the
command:
a. joinvg
b. importvg
c. recreatevg
The answer is recreatevg.
© Copyright IBM Corporation 2012
Additional information —
Transition statement —
Uempty
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 10. Advanced backup techniques 10-79
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Introduce the lab exercise.
Details — If you had the students do the exercises immediately after each matching topic,
then there is no lab exercise at this point.
Additional information —
Transition statement —
Uempty
Unit summary
IBM Power Systems
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 10. Advanced backup techniques 10-81
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Summarize the unit.
Details — Present the highlights from the unit.
Before continuing to the next unit stop and ask the students if there are any additional
questions before continuing.
Additional information —
Transition statement — Let’s continue with the next unit.
Estimated time
00:25
References
Online AIX Version 7.1 Understanding the Diagnostic
Subsystem for AIX
Note: References listed as “online” above are available at the
following address:
http://publib.boulder.ibm.com/eserver/
Unit objectives
IBM Power Systems
Notes:
Diagnostics
NIM Master
CD-ROM
bos.diag
Diagnostics
Notes:
Introduction
The lifetime of hardware is limited. Broken hardware leads to hardware errors in the
error log, to systems that will not boot, or to very strange system behavior.
The diagnostic package helps you to analyze your system and discover hardware that
is broken. Additionally, the diagnostic package provides information to service
representatives that allows fast error analysis.
Instructor notes:
Purpose — Give reasons when diagnostics are used. Describe the different sources for
diagnostics.
Details —
Additional information —
Transition statement — Let’s discuss where to use diagnostics.
Uempty
Physical
P
adapter Virtual I/O Server Client Client
S
VSCSI server
virtual adapter
Hypervisor
Physical hdisk VSCSI protocol
storage
Notes:
Diagnostics are done on physical devices. It is fairly common to have totally virtualized
logical partitions which only see virtual devices: virtual Ethernet, virtual SCSI, virtual
Fibre Channel. The diag utilities will not diagnose virtual devices.
In a virtualized environment, the physical device are allocated to the virtual I/O servers
(VIOS). If a client LPAR is having problems accessing a device, the administrator needs
to identify the VIOS providing access and run the diagnostics at the VIOS.
The VIOS command line interface (CLI) equivalent of the AIX diag command is the
diagmenu command. The alternative is to create a root AIX subshell with the
oem_setup_env command and run the AIX command in that shell.
Instructor notes:
Purpose — Explain where to run diagnostics in a virtualized environment.
Details — Do not get into details of how virtualization of devices works or how to
implement them. The focus here is on where the physical devices are actually managed.
The bold circle in the visual is around the physical storage adapter to emphasize where the
diagnostic needs to be executed.
Generally, many of the techniques discussed in the course are applicable to problems with
an VIOS since it is essentially using AIX under-the-covers. If a VIOS is having problem
booting and one suspects a device problem, it can be booted to a standalone diagnostic
mode, just like an AIX logical partition.
Additional information —
Transition statement — Let’s discuss how to use diagnostics.
Uempty
diag
Notes:
Instructor notes:
Purpose — Introduce the diag command.
Details —
Additional information — When the diagnostic tool runs, it automatically tries to diagnose
hardware errors it finds in the error log. The information generated by the diag command is
put back into the error log entry, so that it is easy to make the connection between the error
event and, for example the FRU number required to repair failing hardware.
Transition statement — Let’s show how to work with the diag menus.
Uempty
# diag
FUNCTION SELECTION 801002
Notes:
- If the selected task does not support resource selection, then the task is invoked.
If the Resource Selection menu is selected, then the following happens:
- The Diagnostic Controller displays a list of resources available on the system.
- After a resource has been selected, a Task Selection menu will appear containing
the commonly supported tasks for each selected resource. After selection of a task,
the task is invoked.
# diag
FUNCTION SELECTION 801002
Notes:
Uempty this selection after you have repaired a device, unless you remove the error log
entries of the broken device.
Instructor notes:
Purpose — Explain how to work with the diag command.
Details —
Additional information — The diagnostics version number appears on the first
diagnostics screen. The version of diagnostics may become an issue in rare cases.
Normally, diagnostic versions are backwards-compatible. However, diagnostic support for
older hardware may have been dropped from the CD for a particular version of diagnostics.
In this situation, students should contact support for more information.
If Problem Determination is chosen, then the Diagnostic Controller automatically scans the
error log for any PERMANENT HARDWARE errors that have been logged within the last 7
days to determine if any devices should be automatically tested. A problem report may be
generated. It then walks the configuration database to determine which resources in the
current configuration can be tested. This information is presented in the Resource
Selection Menu.
If Advanced Diagnostics Routines is chosen, and the system is in Online Service mode
of operation, the Diagnostic Controller will display the Test Method menu to determine if
the tests should be repeated. It initializes the input parameters to the Diagnostic
Application (DA), which are contained in the Test Mode Input object class and then runs the
Diagnostic Application (DA) of the resource to be tested.
Once the DA to completes, the Diagnostic Controller then:
- Performs isolation process
- Presents conclusions to the screen
If no trouble is found, diagnostics exits with a return value of 0. Otherwise, a value of 1 is
returned if the hardware was tested bad.
Transition statement — Let’s show how to select hardware devices to test.
Uempty
Notes:
Instructor notes:
Purpose — Explain how to select hardware devices.
Details —
Additional information —
Transition statement — What happens if a device is busy?
Uempty
Notes:
Instructor notes:
Purpose — Explain what happens if a device is busy.
Details —
Additional information —
Transition statement — Let’s describe the different diagnostic modes.
Uempty
Diagnostic modes (1 of 3)
IBM Power Systems
Concurrent mode:
# diag
Execute diag during normal
system operation
Limited testing of components
Notes:
Diagnostic modes
Three different diagnostic modes are available:
- Concurrent mode
- Maintenance (single-user) mode
- Service (standalone) mode (covered on the next visual).
Concurrent mode
Concurrent mode provides a way to run online diagnostics on some of the system
resources while the system is running normal system activity. Certain devices can be
tested, for example, a tape device that is currently not in use, but the number of
resources that can be tested is very limited. Devices that are in use cannot be tested.
Diagnostic modes (2 of 3)
IBM Power Systems
Notes:
Standalone mode
But what do you do if your system does not boot or if you have to test a system without
AIX installed on the system? In this case, you must use the standalone mode.
Standalone mode offers the greatest flexibility. You can test systems that do not boot or
that have no operating system installed (the latter requires a diagnostic CD-ROM).
Uempty environment, the firmware should shutdown the partition after AIX reaches a halt
state.
3. Boot your AIX system. If in manufacturing default configuration, you could power
on the server from the operator panel. If in a partitioned system, you would use
the HMC to start the LPAR.
4. If starting a partition with the HMC, you would specify a boot mode of Diagnostic
with Default Bootlist. If using the manufacturing default configuration with an
attached console, see the paragraph on using the console keyboard to control
the boot mode. Either method boot the machine in service mode.
5. If the CD drive has a diagnostic CD-ROM mounted, this will start the diagnostic
program that is on that disk. If there is nothing in the CD drive, then it will boot off
the hard drive, executing the diagnostic program on that hard drive.
6. At this point, you can invoke one of the diagnostic routines.
Instructor notes:
Purpose — Describe how to start up the standalone mode.
Details —
Additional information — Standalone mode allows the greatest number of devices to be
tested. However, it does not have the ability to examine entries in the system error log.
Transition statement — Let us also see how you can boot to diagnostics mode using a
NIM server as the provider of the diagnostics routine.
Uempty
Diagnostic modes (3 of 3)
IBM Power Systems
HMC
Boot LPAR to SMS
Network boot your LPAR
and configure for
network boot
Notes:
Instructor notes:
Purpose — Explain how to use NIM to bot to diagnostics.
Details —
Additional information —
Transition statement — Let’s look at additional tasks diag provides.
Uempty
# diag
FUNCTION SELECTION 801002
Notes:
Additional tasks
The diag command offers a wide number of additional tasks that are hardware related.
All these tasks can be found after starting the diag main menu and selecting Task
Selection.
The tasks that are offered are hardware (or resource) related. For example, if your
system has a service processor, you will find service processor maintenance tasks,
which you do not find on machines without a service processor. On some systems, you
find tasks to maintain RAID and SSA storage systems.
Diagnostic log
IBM Power Systems
# /usr/lpp/diagnostics/bin/diagrpt -r
ID DATE/TIME T RESOURCE_NAME DESCRIPTION
DC00 Mon Oct 08 16:13:06 I diag Diagnostic Session was started
DAE0 Mon Oct 08 16:10:38 N hdisk2 The device could not be tested
DC00 Mon Oct 08 16:10:13 I diag Diagnostic Session was started
DA00 Mon Oct 08 16:05:11 N sysplanar0 No Trouble Found
DA00 Mon Oct 08 16:05:05 N sisscsia0 No Trouble Found
DC00 Mon Oct 08 16:04:46 I diag Diagnostic Session was started
# /usr/lpp/diagnostics/bin/diagrpt -a
IDENTIFIER: DC00
Date/Time: Mon Oct 08 16:13:06
Sequence Number: 15
Event type: Informational Message
Resource Name: diag
Diag Session: 327726
Description: Diagnostic Session was started.
----------------------------------------------------------------------------
IDENTIFIER: DAE0
Date/Time: Mon Oct 08 16:10:38
Sequence Number: 14
Event type: Error Condition
Resource Name: hdisk2
Resource Description: 16 Bit LVD SCSI Disk Drive
Location: U7311.D20.107F67B-P1-C04-T2-L8-L0
© Copyright IBM Corporation 2012
Notes:
Diagnostic log
When diagnostics are run in online or single user mode, the information is stored into a
diagnostic log. The binary file is called /var/adm/ras/diag_log. The command,
/usr/lpp/diagnostics/bin/diagrpt, is used to read the content of this file.
Report fields
The ID column identifies the event that was logged. In the example in the visual, DC00
and DA00 are shown. DC00 indicated the diagnostics session was started and the DA00
indicates No Trouble Found (NTF).
The T column indicates the type of entry in the log. I is for informational messages. N is
for No Trouble Found. S shows the Service Request Number (SRN) for the error that
was found. E is for an Error Condition.
Checkpoint
IBM Power Systems
Notes:
Checkpoint solutions
IBM Power Systems
Additional information —
Transition statement — Now, let’s do an exercise.
Exercise: Diagnostics
IBM Power Systems
Notes:
Unit summary
IBM Power Systems
Notes:
Estimated time
01:25
00:30 Wrap up / Evaluations
References
Online AIX Version 7.1 Command Reference volumes 1-6
Online AIX Version 7.1 Kernel Extensions and Device Support
Programming Concepts (Chapter 16. Debug Facilities)
Online AIX Version 7.1 Operating system and device management
(section on System Startup)
Note: References listed as “online” above are available at the
following address:
http://publib.boulder.ibm.com/infocenter/systems
© Copyright IBM Corp. 2009, 2012 Unit 12. The AIX system dump facility 12-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Unit objectives
IBM Power Systems
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 12. The AIX system dump facility 12-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
System dumps
IBM Power Systems
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 12. The AIX system dump facility 12-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Types of dumps
IBM Power Systems
Traditional:
AIX generates dump prior to halt
Firmware assisted (fw-assist):
POWER firmware generates dump in parallel with AIX halt process
Defaults to same scope of memory as traditional
Can request a full system dump
Live dump facility:
Selective dump of registered components without need for a system
restart
Can be initiated by software or by operator
Controlled by livedumpstart and dumpctrl
Written to a file system rather than a dump device
Notes:
Overview
In addition to the traditional dump function, AIX 6 introduced two new types of dumps.
Traditional dumps
Traditionally, AIX alone handled system dump generation and the only way to get a
dump was to halt the system either due to a crash or through operator request. In a
logical partition it will only dump the memory that is allocated to that partition.
Uempty In its default mode, it will capture the same scope of memory as the traditional dump,
but it can be configured for a full memory dump.
If, for some reason (such as memory restrictions), a configured or requested firmware
assisted dump is not possible, then the traditional dump facility will be invoked.
More details on the configuration and initiation of firmware assisted dumps will be
covered later in the context of the sysdumpdev and sysdumpstart commands.
© Copyright IBM Corp. 2009, 2012 Unit 12. The AIX system dump facility 12-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Explain the different types of dumps
Details — This is only an overview of the dump types. Do not go into much detail here.
There are two main reasons for introducing these dump types. First, they will likely hear
them referred to and this will help clarify what these are about. Second, they will see
references to the firmware assisted dumps when we look at the smit panels and line
commands for dump management, later in the unit.
Note that there is later visual that discusses firmware assisted dumps in more detail.
Additional information —
Transition statement — Let’s look at ways a system dump might be created.
Uempty
Using At Using
keyboard or unexpected command
reset button system halt
Using remote
reboot facility
Using HMC
Using SMIT
reset - dump
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 12. The AIX system dump facility 12-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
- For logical partitions running AIX, the HMC can issue a restart with dump request
which is the functional equivalent of the previously described reset button triggered
dump.
- The superuser can issue a command directly, or through SMIT, to invoke a system
dump.
- The remote reboot facility can also be used to create a system dump.
© Copyright IBM Corp. 2009, 2012 Unit 12. The AIX system dump facility 12-11
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
888
code
Software Hardware
Reset
102 103
Yes
Reset for Reset twice
crash code for SRN
yyy-zzz
Reset for
dump code Reset once
for FRU Optional
codes for
hardware
Reset eight times failure
for location code
Notes:
Uempty On systems with no HMC and a three-digit or a four-digit operator panel, you may need
to press the system’s reset button to view the additional digits after the 888. Once the
series cycles back to showing 888, the sequence is over.
102 code
A 102 code indicates that a system dump has occurred; your AIX kernel crashed due to
bad circumstances. You may need to press the reset button to obtain the crash code
and then the dump code.
103 code
A 103 code usually indicates a hardware error. In an HMC managed LPAR
environment, hardware errors are reported through the service focal point of the HMC;
thus, you should not expect to see an 888-103 sequence for in an LPAR reference code
field on the HMC. Working with the HMC facilities is covered in the LPAR training.
If you do have an 888-103 sequence, pressing the reset button twice will get a Service
Request Number, which may be used by IBM support to analyze the problem.
In case of a hardware failure, additional resets would retrieve the sequence number of
the Field Replaceable Unit (FRU) and a location code. The location code identifies the
physical location of a device.
© Copyright IBM Corp. 2009, 2012 Unit 12. The AIX system dump facility 12-13
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Introduce what an 888 display code means.
Details — Describe what students have to do when an 888 display occurs. Emphasize
that, in an HMC managed LPAR environment, they should only see the 888-102 sequence.
The focus here is on crashes which result in dumps (the left side of the diagram).
Additional information —
Transition statement — Whether an unintended system crash or an administrator
requested dump, where is the dump stored and how do we access it?
Uempty
hd6
/dev/hd6 Primary dump device
Next boot:
Copy dump into ...
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 12. The AIX system dump facility 12-15
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Describe what happens if a dump occurs.
Details — Base your presentation on the material in the student notes.
Additional information — None
Transition statement — Let’s find out where all this information is written and how you
can customize this.
Uempty
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 12. The AIX system dump facility 12-17
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Examples on visual
The examples on the visual illustrate use of several of the sysdumpdev flags discussed
in the preceding material.
© Copyright IBM Corp. 2009, 2012 Unit 12. The AIX system dump facility 12-19
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
© Copyright IBM Corp. 2009, 2012 Unit 12. The AIX system dump facility 12-21
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Starting with AIX 7.1, firmware assisted dump (FWAD) is now the
default system dump type if:
Platform is POWER6 and above
Memory is at least 1.5 GB
Dump logical volume is in rootvg and not the hd6 logical volume
User can request a full memory dump in case the default (selective
memory) dump is not providing enough data:
# sysdumpdev –f require
Diskless thin servers can dump to remote iSCSI disks using FWAD
© Copyright IBM Corporation 2012
Notes:
With POWER6 and later processor based systems, system dumps can be assisted by
firmware. Firmware-assisted system dumps are different from traditional system dumps
that are generated before a system partition is reinitialized because they take place
when the partition is restarting.
In fw-assist mode, in order to improve fault tolerance and performance, disk writing
operations are done as much as possible during the AIX Boot phase in parallel with the
AIX initialization.
A firmware-assisted system dump takes place under the following conditions:
- The firmware assisted dump is supported only on POWER6-based servers and
later
- The memory size at system startup is equal to or greater than 4 GB
- The system has not been configured to do a traditional system dump
- Physical partition size of 16 MB
Uempty The system administrator can reconfigure to use traditional dump instead:
# sysdumpdev –t traditional
The system administrator can request a full memory dump in case the default (selective
memory) dump is not providing enough data:
# sysdumpdev –f require
Diskless thin servers can dump to remote iSCSI disks using FWAD.
© Copyright IBM Corp. 2009, 2012 Unit 12. The AIX system dump facility 12-23
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Point out the AIX 7.1 changes for firmware assisted dumps.
Details —
Additional information — The main steps of a firmware-assisted system dump are (from
the IBM AIX Version 6.1 Differences Guide):
1. When all conditions for a firmware-assisted dump are validated (at system initialization),
AIX reserves a dedicated memory scratch area.
2. This predefined scratch area is not released unless the system administrator explicitly
configures a legacy dump configuration.
3. The predefined scratch area size is relative to the memory size and ensures AIX will be
able to reboot while the firmware-assisted dump is in progress. Note: As dump data is
written at the next restart of the system, the AIX dump tables that are used to refer the
data cannot be preserved.
4. System administrators must be aware that this dedicated scratch area is not adjusted
when a memory DR operation modifies the memory size. A verification can be run with
the sysdumpdev command by system administrators in order to be notified if the
firmware-assisted system dump is still supported.
5. AIX determines the memory blocks that contain dump data and notifies the dedicated
hypervisor to start a firmware-assisted dump with this information.
6. The hypervisor logically powers the partition off, but preserves partition memory
contents.
7. The hypervisor copies just enough memory to the predefined scratch area so that the
boot process can start without overwriting any dump data.
8. The AIX boot loader reads this dedicated area and copies it onto disk using dedicated
open firmware methods. The hypervisor has no authority and is unable by design to
write onto disk for security reasons.
9. AIX starts to boot and in parallel copies preserved memory blocks. The preserved
memory blocks are blocks that contain dump data not already copied by the AIX boot
loader. As with the traditional dump, a firmware-assisted dump uses only the first copy
of rootvg as the dump device; it does not support disk mirroring. [course developer’s
note: this last statement is in contradiction to other research on the disk mirroring
support]
10. The dump is complete when all dump data is copied onto disk. The preserved memory
then returns to AIX usage.
11. AIX waits until all the preserved memory is returned to its partition usage in order to
launch any user applications.
Transition statement — Let look at situations where a dedicated logical volume will be
used as the dump device.
Uempty
48 GB and up 4 GB
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 12. The AIX system dump facility 12-25
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Explain that a dedicated dump device is created for systems with more than
4 GB of main memory.
Details — Point out that the size of the dedicated dump device depends on the amount of
physical memory on this system and mention the default name of the dedicated dump
device.
Additional information —
Transition statement — You can specify the name and size of the dedicated dump device
instead of using the defaults we have just discussed.
Uempty
/bosinst.data
...
control_flow:
CONSOLE = /dev/vty0
...
large_dumplv:
DUMPDEVICE = /dev/lg_dumplv
SIZEGB = 1
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 12. The AIX system dump facility 12-27
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
If the stanza is not present, the dedicated dump device is created when required, using
the default values previously discussed.
© Copyright IBM Corp. 2009, 2012 Unit 12. The AIX system dump facility 12-29
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
The last method for using a dedicated dump logical volume is to manually configure it
after system installation. The procedure shown is the visual is fairly straightforward. The
main concern is the usual one of need the allocation to be large enough to handle the
dump.
A common question concerns the use of the mirrorvg command. If you are at a
currently supported release of AIX (with current maintenance) dumping to an LVM
mirrored logical volume is supported, but the dump processing takes much longer when
using LVM mirroring. It is recommended that you do not mirror the dump logical volume.
The mirrorvg command will not mirror a dump logical volume in the rootvg unless it is
the paging space.
© Copyright IBM Corp. 2009, 2012 Unit 12. The AIX system dump facility 12-31
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
# sysdumpdev -e
0453-041 estimated dump size in bytes: 10485760
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 12. The AIX system dump facility 12-33
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Show how to estimate the disk space needed for a system dump.
Details — The command sysdumpdev -e will estimate the dump size. It is just an estimate.
To be safe, the disk space should be larger than the estimate. Also, if the system has
dumped in the past, looking at the size of the past dump can provide more guidance on
sizing the dump device. This can be seen using the command sysdumpdev -L (mentioned
earlier in the unit).
In AIX V4.3.2, the ability to compress the dump was introduced. Turning on dump
compression will reduce the space needed significantly. Dump compression is on by
default starting in AIX 5L V5.3. Dumps are always compressed in AIX 6.1 and later.
You should mention a few other points about dump devices:
• If a paging device (like hd6) is used for dumps, it must be part of rootvg.
• The primary dump device must always be in the rootvg.
• The secondary dump device may be outside rootvg as long as it is not a paging device.
• Prior to 4.3.3, dump devices should not be mirrored. The dump information was written
to only one mirror and the mirror was not marked stale. When rebooting, the information
in the dump device would write the data to the dump file using both copies of the mirror
even though only one mirror had the correct information. This created a corrupted dump
file. In 4.3.3, this was corrected by allowing the dump file to be read only from the good
copy.
• AIX at V5.3 and later allows a DVD device to be used as a primary or secondary dump
device.
Additional information —
Transition statement — Let’s look at a new feature in AIX 5L that checks dump space
sizes.
Uempty
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 12. The AIX system dump facility 12-35
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
© Copyright IBM Corp. 2009, 2012 Unit 12. The AIX system dump facility 12-37
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Uempty The -p flag of sysdumpstart is used to specify a dump to the primary dump device.
The -s flag of sysdumpstart is used to specify a dump to the secondary dump device.
The -t flag of sysdumpstart is used to change the default type from fw_assist to
traditional.
The -f flag of sysdumpstart is used to change the scope of the dump (interacts with
the configuration set up with sysdumpdev):
- disallow - Do not allow a full memory dump
- require - Require a full memory dump
© Copyright IBM Corp. 2009, 2012 Unit 12. The AIX system dump facility 12-39
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
© Copyright IBM Corp. 2009, 2012 Unit 12. The AIX system dump facility 12-41
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
S1
ump
D
login: #dump#>1
Add a TTY
...
REMOTE Reboot ENABLE: dump
REMOTE Reboot STRING: #dump#
...
Notes:
reboot_enable
The value of this attribute (referred to as REMOTE Reboot ENABLE in SMIT) indicates
whether this port is enabled to reboot the machine on receipt of the remote
reboot_string, and if so, whether to take a system dump prior to rebooting:
- no - Indicates remote reboot is disabled
- reboot - Indicates remote reboot is enabled
- dump - Indicates remote reboot is enabled, and, prior to rebooting, a system dump
will be taken on the primary dump device
reboot_string
This attribute (referred to as REMOTE Reboot STRING in SMIT) specifies the remote
reboot_string that the serial port will scan for when the remote reboot feature is
enabled. When the remote reboot feature is enabled, and the reboot_string is
received on the port, a '>' character is transmitted, and the system is ready to reboot. If
a '1' character is received, the system is rebooted (and a system dump may be started,
depending on the value of the reboot_enable attribute); any character other than '1'
aborts the reboot process. The reboot_string has a maximum length of 16 characters
and must not contain a space, colon, equal sign, null, new line, or Ctrl-\ character.
© Copyright IBM Corp. 2009, 2012 Unit 12. The AIX system dump facility 12-43
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Explain how to start a dump from a TTY.
Details — Base your explanation on the material in the student notes.
Additional information — As mentioned in the student notes, the values for REMOTE
Reboot ENABLE are:
no Remote reboot is disabled
reboot Remote reboot is enabled
dump Remote reboot is enabled and a dump will occur prior to reboot
There is a good discussion of the remote boot facility (starting on page 24) in the AIX 5L
Version 5.3 System Management Guide: Operating System and Devices.
Transition statement — Let’s look at the dump interface of SMIT.
Uempty
# smit dump
System Dump
Move cursor to desired item and press Enter
Show Current Dump Devices
Show Information About the Previous System Dump
Show Estimated Dump Size
Change the Type of Dump
Change the Full Memory Dump Mode
Change the Primary Dump Device
Change the Secondary Dump Device
Change the Directory to which Dump is Copied on Boot
Start a Dump to the Primary Dump Device
Start a Traditional System Dump to the Secondary Dump Device
Copy a System Dump from a Dump Device to a File
Always ALLOW System Dump
Check Dump Resources Utility
Change/Show Global System Dump Properties
Change/Show Dump Attributes for a Component
Change Dump Attributes for multiple Components
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 12. The AIX system dump facility 12-45
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
The menu items that show or change the dump information use the sysdumpdev
command.
© Copyright IBM Corp. 2009, 2012 Unit 12. The AIX system dump facility 12-47
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
If using an HMC to manage the LPAR, you may use the HMC GUI interface (or the
chsysstate command) to trigger a dump of the operating system.
In the GUI interface you would select the LPAR and then from the tasks menu:
Operations > Restart. The resulting window is shown in the visual. Clicking the Dump
button will select an operation to signal the system to effectively signal a reset to initiate
a dump.
© Copyright IBM Corp. 2009, 2012 Unit 12. The AIX system dump facility 12-49
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
System-initiated dumps
If a system dump is initiated through a kernel panic, the progress code 0c9 will be
displayed while the dump is in progress, and then either a flashing 888 or a steady 0c0.
For LPARs supported by an HMC, this will display in the reference code column. For a
server with no HMC and only one OS, it will display in the LED on the systems operator
panel.
All of the LED codes following an 888 (remember: you must use the Reset button),
should be recorded and passed to IBM.
User-initiated dumps
For user-initiated system dumps to the primary dump device, the progress codes should
indicate 0c2 for a short period, followed by 0c0 upon completion.
© Copyright IBM Corp. 2009, 2012 Unit 12. The AIX system dump facility 12-51
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — List the different dump progress codes that will be seen under different
circumstances.
Details — Go through the list and highlight first the codes that will be seen if a
system-initiated dump occurs and then if a user-initiated dump occurs. Mention that dumps
can require a significant amount of time; on the course development system with 1.5 GB of
memory, it took 25 minutes to generate the dump.
Refer to the student notes for a detailed description of the commonly seen codes.
Additional information — While the dump is occurring, the 0c2 or 0c9 code is displayed.
How long the dump takes to complete is dependent on how large the dump is. Small
dumps should take less than 30 seconds; large dumps may take several minutes.
On machines with two line front panel displays (LEDs), the second line will display the
number of bytes written so far to the dump device. This provides an indication to you that
the dump is still proceeding well, and it also gives you an idea of how much more data has
to be written (if you have a record of a past sysdumpdev -e).
Transition statement — Having caused a dump, the next issue you have to consider is
how you are going to retrieve the dump from your system.
Uempty
Is hd6 being
used as the
yes rc.boot 2
dump logical
volume?
no Is there
yes sufficient space
in /var to copy
Use savecore dump to?
no
Dump copied
Display the copy forced copy flag
dump to tape =
menu TRUE
/var/adm/ras
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 12. The AIX system dump facility 12-53
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
The system dump is 583973 bytes and will be copied from /dev/hd6
to media inserted into the device from the list below.
Please make sure that you have sufficient blank, formatted media
before proceeding.
88 Help?
99 Exit
The copy dump menu will only be displayed if the sysdumpdev attribute of forced copy
flag has a value of TRUE.
© Copyright IBM Corp. 2009, 2012 Unit 12. The AIX system dump facility 12-55
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
# smit chgsys
Change/Show Characteristics of Operating System
...
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 12. The AIX system dump facility 12-57
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Describe how to set up an automatic reboot after a crash.
Details — Base your explanation on the material in the student notes.
Additional information — None
Transition statement — Let’s discuss the snap command.
Uempty
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 12. The AIX system dump facility 12-59
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Uempty - Specifying the name of a file that contains the list of scripts (one per line) that snap
should call. The syntax file:<name of file containing list of scripts> is used in this
case.
© Copyright IBM Corp. 2009, 2012 Unit 12. The AIX system dump facility 12-61
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Explain how the system dump should be prepared before it is sent to the IBM
Support Center.
Details — Provide the students with as much of the following information as you think is
appropriate:
The information gathered with the snap command can be used to identify and resolve
system problems. You must have root authority to execute this command.
If you use the -a flag, then you need approximately 8 MB of temporary disk space to collect
all the system information, including the contents of the error log (covered in a previous
unit).
The -g flag gathers the following information:
• Error report
• Copy of the customized ODM
• Trace file
• User environment
• Amount of physical memory and paging space
• Device and attribute information
• Security user information
The output from the -g flag is written to tmp/ibmsupt/general/general.snapfile. However,
you can specify another directory using the -d flag.
The execution of snap appends information to the previously created files. Use the -r flag
to remove previously gathered and saved information.
Before you send your media to the support center, ensure you call them and obtain a
Problem Management Number (PMR) which will be used to trace the status of your
problem. Ensure you label the media with this number, and also the other pieces of
information listed, to help the support team act quickly on your problem.
There is not much left for you to do after this, apart from waiting for a response from the
Support Center. However, you may want to have a look at your dump to try and analyze it
yourself. The tool that is used by the support center to analyze your dump is called kdb
(crash prior to AIX 5L V5.1), which is also available on the system; however, the output
from the command is very user unfriendly. Most people do not bother with this.
See the student notes for the AIX 5L V5.3 enhancements.
Additional information — In AIX 5L, the pax command was enhanced to allow archiving
of large files, such as dumps. The tar command, which was used prior to AIX 5L, does not
support files larger than 2 GB. If the file to be archived is larger than 2 GB, the only thing
available is pax.
Transition statement — Let's take a brief look at kdb to see how it can be used.
Uempty
/unix
/var/adm/ras/vmcore.x
(Kernel)
(Dump file)
# uncompress /var/adm/ras/vmcore.x.Z
OR
# dmpuncompress /var/adm/ras/vmcore.x.BZ
# kdb /var/adm/ras/vmcore.x /unix
> status
> stat
(further sub-commands for analyzing)
> quit
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 12. The AIX system dump facility 12-63
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Useful subcommands
Examining a system dump requires an in-depth knowledge of the AIX kernel. However,
there are two subcommands that might be useful to you:
- The subcommand status displays the processes/threads that were active on the
CPUs when the crash occurred
- The subcommand stat shows the machine status when the dump occurred
To exit the kdb debug program, type quit at the > prompt.
© Copyright IBM Corp. 2009, 2012 Unit 12. The AIX system dump facility 12-65
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Introduce the kdb command.
Details — Cover the information in the student notes.
You might also want to make some of the points mentioned below:
kdb is an interactive utility for examining an operating system image, a core image, or the
running kernel. It also interprets and formats control structures in the system and provides
certain miscellaneous functions useful for examining a dump.
In order to analyze the dump, you must execute the kdb command against /unix, and it
must be the /unix of the system that had the problem. To make any change to code, you
must have the source AIX code, which is not held by customers - so there is not much more
that you can do. Generally speaking, it is best left for the IBM Support Center to handle the
dump.
The last thing you want to do is send a dump to the IBM Support Center and find out that
they cannot do anything about it because it is a partial dump. Get it right from the start.
Additional information — Prior to AIX 5L V5.1, the crash command was used instead of
the kdb command.
Transition statement — We have reached a checkpoint.
Uempty
Checkpoint
IBM Power Systems
3. If the copy directory is too small, will the dump, which is copied
during the reboot of the system, be lost?
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 12. The AIX system dump facility 12-67
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Present the checkpoint questions.
Details — A “Checkpoint Solution” is given below:
Checkpoint solutions
IBM Power Systems
1. If your system has less than 4 GB of main memory, what is the default
primary dump device? Where do you find the dump file after reboot?
The answer is the default primary dump device is /dev/hd6. The default
dump file is /var/adm/ras/vmcore.x, where x indicates the number of the
dump.
3. If the copy directory is too small, will the dump, which is copied during the
reboot of the system, be lost?
The answer is if the force copy flag is set to TRUE, a special menu is
shown during reboot. From this menu, you can copy the system dump to
portable media.
4. Which command should you execute to collect system data before sending
a dump to IBM?
The answer is snap.
© Copyright IBM Corporation 2012
Additional information — Here are a couple of points you might want to make when
going over the answers to the checkpoint:
• If there is 4 GB or more of memory, then a dedicated dump logical volume is created.
• Dump compression can be turned off with the -c flag of sysdumpdev.
Transition statement — Let’s switch over to the lab.
Uempty
Notes:
© Copyright IBM Corp. 2009, 2012 Unit 12. The AIX system dump facility 12-69
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Transition to the exercise for this unit.
Details —
Additional information —
Transition statement — Let’s recall some of the key points from this unit.
Uempty
Unit summary
IBM Power Systems
Notes:
Discussion
When a dump occurs, kernel and system data are copied to the primary dump device.
By default, the system has a primary dump device (/dev/hd6) and a secondary device
(/dev/sysdumpnull).
During reboot, the dump is copied to the copy directory (/var/adm/ras).
A system dump should be retrieved from the system using the snap command.
The Support Center uses the kdb debugger to examine the dump.
© Copyright IBM Corp. 2009, 2012 Unit 12. The AIX system dump facility 12-71
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Summarize the unit.
Details — Present the highlights from the unit.
Before continuing to the next unit stop and ask the students if there are any additional
questions before continuing.
Additional information — You might want to note that, if the system has 4 GB or more of
main memory, then a dedicated dump logical volume is created. So, the default primary
dump device actually depends on the amount of physical memory installed in the system.
Transition statement — This brings us to the end of this course. Thank you.
Checkpoint solutions
IBM Power Systems
Checkpoint solutions
IBM Power Systems
Checkpoint solutions (1 of 2)
IBM Power Systems
Checkpoint solutions (2 of 2)
IBM Power Systems
Checkpoint solutions
IBM Power Systems
Checkpoint solutions (1 of 2)
IBM Power Systems
1. True or false: You must have AIX loaded on your system to use
the System Management Services programs.
The answer is false: SMS is part of the built-in firmware.
Checkpoint solutions (2 of 2)
IBM Power Systems
4. What command is used to build a new boot image and write it to the
boot logical volume?
The answer is bosboot -ad /dev/hdiskx.
6. True or false: During the AIX boot process, the AIX kernel is loaded
from the root file system.
The answer is false: the AIX kernel is loaded from hd5.
Solutions for Figure 6-10, "Let’s review: rc.boot (1 of 3)," on page 6-29
(1)
(2)
restbase
(4)
(3)
ODM files in RAM cfgmgr -f
file system
(5)
bootinfo -b
AP Solutions for Figure 6-11, "Let’s review: rc.boot (2 of 3)," on page 6-31
(5)
rc.boot 2 Merge RAM /dev files
(1) (6)
Activate rootvg Copy RAM ODM files
(4)
Turn on
paging
Solutions for Figure 6-12, "Let’s review: rc.boot (3 of 3)," on page 6-33
savebase
/etc/inittab
syncd 60
/sbin/rc.boot3 errdemon
rm /etc/nologin
syncvg rootvg &
chgstatus=3
cfgmgr -p2 in CuDv ?
cfgmgr -p3
AP Solutions for Figure 6-18, "Let's review: /etc/inittab file," on page 6-53
Checkpoint solutions (1 of 2)
IBM Power Systems
Checkpoint solutions (2 of 2)
IBM Power Systems
5. What is the likely cause if your system stops booting with LED 553?
The answer is there is a problem with processing /etc/inittab.
Checkpoint solutions
IBM Power Systems
Checkpoint solutions
IBM Power Systems
2. This volume group consists of two disks that are completely mirrored.
Because of the disk failure you are not able to vary on datavg. How do
you recover from this situation?
The answer is forced varyon: varyonvg -f datavg. Use procedure
1 for mirrored disks.
3. After disk replacement, you find that a disk has been removed from the
system but not from the volume group. How do you fix this?
The answer is repair the ODM, for example through exportvg and
importvg. Execute reducevg using the PVID instead of disk name.
Checkpoint solutions (1 of 2)
IBM Power Systems
3. Why should you not use exportvg with an alternate disk volume
group?
The answer is this will remove rootvg related entries from
/etc/filesystems.
Checkpoint solutions (2 of 2)
IBM Power Systems
Checkpoint solutions
IBM Power Systems
1. True or false: The creation of a snapshot volume group marks all copies in the
snapshot as stale.
The answer is false.
2. True or false: The creation of a JFS split copy marks all of the split mirror copies
as stale.
The answer is true.
3. True or false: After the creation of a JFS split mirror copy, the administrator needs
to mount the new file system in order to access the split copy.
The answer is false.
4. To access a SAN Copy of an active volume group on the source system, use the
command:
a. joinvg
b. importvg
c. recreatevg
The answer is recreatevg.
© Copyright IBM Corporation 2012
Checkpoint solutions
IBM Power Systems
Checkpoint solutions
IBM Power Systems
1. If your system has less than 4 GB of main memory, what is the default
primary dump device? Where do you find the dump file after reboot?
The answer is the default primary dump device is /dev/hd6. The default
dump file is /var/adm/ras/vmcore.x, where x indicates the number of the
dump.
3. If the copy directory is too small, will the dump, which is copied during the
reboot of the system, be lost?
The answer is if the force copy flag is set to TRUE, a special menu is
shown during reboot. From this menu, you can copy the system dump to
portable media.
4. Which command should you execute to collect system data before sending
a dump to IBM?
The answer is snap.
© Copyright IBM Corporation 2012
Directories
mkdir Make directory
cd Change the directory. The default is $HOME directory.
rmdir Remove a directory (beware of files starting with “.”).
rm Remove file; -r option removes directory and all files and
subdirectories recursively.
pwd Print working directory: shows name of current directory
ls List files
-a (all)
-l (long)
-d (directory information)
-r (reverse alphabetic)
-t (time changed)
-C (multi-column format)
-R (recursively)
-F (places / after each directory name & * after each exec file)
Files - Basic
cat List files contents (concatenate). This can open a new file with
redirection, for example, cat > newfile. Use <Ctrl>d to end
input.
AP tty Displays the device that is currently active. Very useful for
XWindows where there are several pts devices that can be
created. It is nice to know which one you have active. who am i
will do the same.
Files - Advanced
awk Programmable text editor / report write
banner Display banner (can redirect to another terminal nn with
> /dev/ttynn)
cal Calendar (cal month year)
cut Cut out specific fields from each line of a file.
diff Differences between two files
find Find files anywhere on disks. Specify location by path (will
search all subdirectories under specified directory).
• -name fl (file names matching fl criteria)
• -user ul (files owned by user ul)
• -size +n (or -n) (files larger (or smaller) than n blocks)
• -mtime +x (-x) (files modified more (less) than x days ago)
• -perm num (files whose access permissions match num)
• -exec (execute a command with results of find command)
• -ok (execute a command interactively with results of find
command)
• -o (logical or)
• -print (display results. Usually included.)
find syntax: find path expression action
For example:
• find / -name "*.txt" -print
• find / -name "*.txt" -exec li -l {} \;
(Executes li -l where names found are substituted for {})
; indicates end of command to be executed and \ removes
usual interpretation as command continuation character)
grep Search for pattern, for example, grep pattern files.
pattern can include regular expressions.
• -c (count lines with matches, but do not list)
• -l (list files with matches, but do not list)
• -n (list line numbers with lines)
• -v (find files without pattern)
Expression metacharacters:
• [ ] matches any one character inside.
• with a - in [ ] will match a range of characters
• ^ matches BOL when ^ begins the pattern.
• $ matches EOL when $ ends the pattern.
• . matches any single character. (same as ? in shell)
• * matches 0 or more occurrences of the preceding
character. (Note: ".*" is the same as "*" in the shell).
sed Stream (text) editor, used with editing flat files
sort Sort and merge files
-r (reverse order); -u (keep only unique lines)
Editors
ed Line editor
vi Screen editor
INed LPP editor
emacs Screen editor +
Metacharacters
* Any number of characters (0 or more)
? Any single character
[abc] [ ] any character from the list
[a-c] [ ] match any character from the list range
! Not any of the following characters (for example, leftbox !abc
right box)
; Command terminator used to string commands on a single line
& Command preceding and to be run in background mode
# Comment character
\ Removes special meaning (no interpretation) of the following
character
Removes special meaning (no interpretation) of character in
quotes
" Interprets only $, backquote, and \ characters between the
quotes
' Used to set variable to results of a command.
for example, now='date' sets the value of now to current
results of the date command
$ Preceding variable name indicates the value of the variable
fsck Checks for file system consistency, and allows interactive repair
of file systems
fuser Lists the process numbers of local processes that use the files
specified
lsattr Lists the attributes of the devices known to the system
lscfg Gives detailed information about the AIX system hardware
configuration
lsdev Lists the devices known to the system
lsfs Displays characteristics of the specified file system such as
mount points, permissions, and file system size
lslv Shows you information about a logical volume
lspv Shows you information about a physical volume in a volume
group
lsvg Shows you information about the volume groups in your system
lvmstat Controls LVM statistic gathering
migratepv Used to move physical partitions from one physical volume to
another
migratelp Used to move logical partitions to other physical disks
mkdev Configures a device
mkfs Makes a new file system on the specified device
mklv Creates a logical volume
mkvg Creates a volume group
mount Instructs the operating system to make the specified file system
available for use from the specified point
quotaon Starts the disk quota monitor
rmdev Removes a device
rmlv Removes logical volumes from a volume group
rmlvcopy Removes copies from a logical volume
umount Unmounts a file system from its mount point
uncompress Restores files compressed by the compress command to their
original size
unmount Exactly the same function as the umount command
varyoffvg Deactivates a volume group so that it cannot be accessed
varyonvg Activates a volume group so that it can be accessed
AP Variables
= Set a variable (for example, d="day" sets the value of d to
"day"), can also set the variable to the results of a command by
the ` character, for example, now=`date` sets the value of
now to the current result of the date command.
HOME Home directory
PATH Path to be checked
SHELL Shell to be used
TERM Terminal being used
PS1 Primary prompt characters, usually $ or #
PS2 Secondary prompt characters, usually >
$? Return code of the last command executed
set Displays current local variable settings
export Exports variable so that they are inherited by child processes
env Displays inherited variables
echo Echo a message (for example, echo HI or echo $d),
can turn off carriage returns with \c at the end of the message,
can print a blank line with \n at the end of the message.
Transmitting
mail Send and receive mail. With userID sends mail to userID.
Without userID, displays your mail. When processing your mail,
at the ? prompt for each mail item, you can:
• d - delete
• s - append
• q - quit
• enter - skip
• m - forward
mailx Upgrade of mail
uucp Copy file to other UNIX systems (UNIX to UNIX copy)
System administration
df Display file system usage
installp Install program
kill (pid) Kill batch process with ID or (PID) (find using ps);
kill -9 PID will absolutely kill process
mount Associate logical volume to a directory;
for example, mount device directory
ps -ef Shows process status (ps -ef)
umount Disassociate file system from directory
smit System management interface tool
Miscellaneous
banner Displays banner
date Displays current date and time
newgrp Change active groups
nice Assigns lower priority to following command (for example,
nice ps -f)
passwd Modifies current password
sleep n Sleep for n seconds
stty Show or set terminal settings
touch Create a zero length files
xinit Initiate X-Windows
wall Sends message to all logged in users
who List users currently logged in (who am i identifies this user)
man,info Displays manual pages
System files
/etc/group List of groups
/etc/motd Message of the day, displayed at login
/etc/passwd List of users and signon information. Password shown as !,
can prevent password checking by editing to remove !
/etc/profile System wide user profile executed at login, can override
variables by resetting in the user's .profile file
/etc/security Directory not accessible to normal users
/etc/security/environ User environment settings
/etc/security/group Group attributes
/etc/security/limits User limits
/etc/security/login.cfg Login settings
/etc/security/passwd User passwords
/etc/security/user User attributes, password restrictions
Variables
var=string Set variable to equal string. (NO SPACES). Spaces must be
enclosed by double quotes. Special characters in string must
be enclosed by single quotes to prevent substitution. Piping (|),
redirection (<, >, >>), and & symbols are not interpreted.
$var Gives value of var in a compound
echo Displays value of var, for example, echo $var
HOME = Home directory of user
MAIL = Mail file name
PS1 = Primary prompt characters, usually "$" or "#"
PS2 = Secondary prompt characters, usually ">"
PATH = Search path
TERM = Terminal type being used
export Exports variables to the environment
env Displays environment variables settings
Commands
# Comment designator
&& Logical-and. Run command following && only if command
Preceding && succeeds (return code = 0)
|| Logical-or. Run command following || only if command
preceding || fails (return code < > 0)
exit n Used to pass return code nl from shell script, passed as
variable $? to parent shell
expr Arithmetic expressions
Syntax: "expr expression1 operator expression2"
operators: + - \* (multiply) / (divide) % (remainder)
for loop for n (or: for variable in $*); for example,:
do
command
done
if-then-else if test expression
then command
elif test expression
then command
else
then command
fi
read Read from standard input
shift Shifts arguments 1-9 one position to the left and decrements
number of arguments
Miscellaneous
sh Execute shell script in the sh shell
-x (execute step-by-step, used for debugging shell scripts)
vi Editor
Entering vi
vi file Edits the file named file
vi file file2 Edit files consecutively (through :n)
.exrc File that contains the vi profile
wm=nn Sets wrap margin to nn. Can enter a file other than at first line
by adding + (last line), +n (line n), or +/pattern (first occurrence
of pattern).
vi -r Lists saved files
vi -r file Recover file named file from crash
:n Next file in stack
:set all Show all options
Units of measure
h, l Character left, character right
k or <Ctrl>p Move cursor to character above cursor
j or <Ctrl>n Move cursor to character below cursor
w, b Word right, word left
^, $ Beginning, end of current line
<CR> or + Beginning of next line
- Beginning of previous line
G Last line of buffer
Cursor movements
Can precede cursor movement commands (including cursor arrow) with number of times to
repeat, for example, 9--> moves right nine characters.
0 Move to first character in line
$ Move to last character in line
^ Move to first nonblank character in line
AP Adding text
a Add text after the cursor (end with <esc>)
A Add text at end of current line (end with <esc>)
i Add text before the cursor (end with <esc>)
I Add text before first nonblank character in current line
o Add line following current line
O Add line before current line
<esc> Return to command mode
Deleting text
<Ctrl>w Undo entry of current word
@ Kill the insert on this line
x Delete current character
dw Delete to end of current word (observe punctuation)
dW Delete to end of current word (ignore punctuation)
dd Delete current line
d Erase to end of line (same as d$)
d) Delete current sentence
d} Delete current paragraph
dG Delete current line through end of buffer
d^ Delete to the beginning of line
u Undo last change command
U Restore current line to original state before modification
Replacing text
ra Replace current character with a
R Replace all characters overtyped until <esc> is entered
s Delete current character and append test until <esc>
s/s1/s2 Replace s1 with s2 (in the same line only)
S Delete all characters in the line and append text
cc Replace all characters in the line (same as S)
Moving text
p Paste last text deleted after cursor (xp will transpose 2
characters)
P Paste last text deleted before cursor
nYx Yank n text objects of type x (w, b = words,) = sentences, } =
paragraphs, $ = end-of-line, and no "x" indicates lines. Can
then paste them with p command. Yank does not delete the
original.
"ayy" Can use named registers for moving, copying, cut/paste with
"ayy" for register a (use registers a-z), can then paste them with
ap command.
Miscellaneous
. Repeat last command
J Join current line with next line
0c0 - 0cc
0c0 A user-requested dump completed successfully.
0c1 An I/O error occurred during the dump.
0c2 A user-requested dump is in progress. Wait at least one minute for the
dump to complete.
0c4 The dump ran out of space. Partial dump is available.
0c5 The dump failed due to an internal failure. A partial dump may exist.
0c7 Progress indicator. Remote dump is in progress.
0c8 The dump device is disabled. No dump device configured.
0c9 A system-initiated dump has started. Wait at least one minute for the
dump to complete.
0cc (AIX 4.2.1 and later) An error occurred writing to the primary dump
device. It switched over to the secondary.
100 - 195
100 Progress indicator. BIST completed successfully.
101 Progress indicator. Initial BIST started following system reset.
102 Progress indicator. BIST started following power on reset.
103 BIST could not determine the system model number.
104 BIST could not find the common on-chip processor bus address.
105 BIST could not read from the on-chip sequencer EPROM.
106 BIST detected a module failure.
111 On-chip sequencer stopped. BIST detected a module error.
112 Checkstop occurred during BIST and checkstop results could not be
logged out.
113 The BIST checkstop count equals 3, that means three unsuccessful
system restarts. System halts.
120 Progress indicator. BIST started CRC check on the EPROM.
121 BIST detected a bad CRC on the on-chip sequencer EPROM.
© Copyright IBM Corp. 2009, 2012 Appendix C. AIX dump code and progress codes C-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
AP 187 BIST was unable to identify the chip release level in the checkstop
logout data.
195 Progress indicator. The BIST checkstop logout completed.
© Copyright IBM Corp. 2009, 2012 Appendix C. AIX dump code and progress codes C-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
© Copyright IBM Corp. 2009, 2012 Appendix C. AIX dump code and progress codes C-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
© Copyright IBM Corp. 2009, 2012 Appendix C. AIX dump code and progress codes C-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
© Copyright IBM Corp. 2009, 2012 Appendix C. AIX dump code and progress codes C-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
c00 - c99
c00 AIX Install/Maintenance loaded successfully.
c01 Insert the AIX Install/Maintenance diskette.
c02 Diskettes inserted out of sequence.
c03 Wrong diskette inserted.
c04 Irrecoverable error occurred.
© Copyright IBM Corp. 2009, 2012 Appendix C. AIX dump code and progress codes C-11
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
© Copyright IBM Corp. 2009, 2012 Appendix C. AIX dump code and progress codes C-13
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
backpg
Back page